Relax atomic memory ordering constraints

faern commented 4 years ago

Currently all atomic memory orderings use SeqCst. This is to be conservative and avoid some bugs due to being careless early on in the development when lots of stuff change around. When the code matures somewhat a lot of the orderings can probably be relaxed.

Since this is tricky. Having the relaxed orderings audited by someone else who is good at this would be awesome.

jjl commented 3 years ago

Hello, I'm the author of a different oneshot channel.

Anyway, more than the memory orderings, what stuck out to me is you're missing out on a few neat tricks that allow you to really speed it up. In async-oneshot, I use an atomic integer as a bitmask. This allows for using atomic bit operations to take the locks. Atomic bit operations can't fail. You can for example issue a fetch_or of LOCK (a constant with a particular bit set) with Ordering::AcqRel and then just check the bit wasn't already set in the return value. Combine that with a spinloop (because you only ever take the lock to move a value) and you're golden.

But you did ask about memory orderings, so, to answer: you only need acquire/release semantics for a oneshot channel implemented with a single atomic. There are some situations where you can get away with relaxed (basically when you aren't going to access other memory on the basis of the atomic - you just want the atomic).

I've actually rewritten async-oneshot recently to support reuse (i.e. no longer be oneshot), but it still retains excellent performance characteristics (though we have to take a lock on write because i've stopped flagging the message. i might bring back the old behaviour, i'm not sure. at the moment. i'm a bit fatigued with it having done 3 consecutive rewrites since the last release and still not being entirely happy...).

I'm happy to answer any questions you may have. Or to collaborate, if you're up for it. I have big plans for async rust and it would be awesome to work with someone else on it. Or even just to get feedback really lol.

Cheers, James

faern commented 3 years ago

I don't think I understand what lock you think I should use an atomic bitmask for? Could you link to some code in this repository?

I already use an atomic integer to read and manipulate the state of the channel. There is no locking going on in this library.

jjl commented 3 years ago

Oh I see, on closer inspection, I got confused by the sync implementation.

However, I did notice a bug here. This violates the futures API contract, where you are always supposed to set a new waker. Whether it will happen to work correctly depends on your chosen executor's implementation.

When you consider this case, I expect you will find a spinloop necessary (even async-waker uses one).

faern commented 3 years ago

Thanks! Yeah I made that mistake in another crate as well: triggered. But I never fixed it here :see_no_evil:

I should probably fix that.

faern commented 3 years ago

@jjl Hi again. It took me a loooong time to get around to look at this keep-last-waker-not-first-waker-issue. But I have now implemented what I believe is a proper fix. I have not merged it yet. If you would like to, I would appreciate a review: #15

I solved it without a spinloop. The receiver can always make progress without looping here. It can try to replace the old waker with a new waker once, and if it fails (because the state changed during the swap) then the sender must have made some progress that allows us to skip updating the waker, and just return instead. So no extra dependencies, and no extra looping.

jjl commented 3 years ago

okay, i think this will work.

github spews a ton of warnings at me about the compare_and_swap though, you might want to update those to compareexchange*.

Cheers, James

faern commented 3 years ago

Thanks. Yeah, I'll get to update those later. But those are not bugs, just deprecation warnings. compare_and_swap will continue to work.

jjl commented 3 years ago

indeed, they have no impact on correctness :)

faern commented 2 years ago

A lot of orderings were relaxed in #17. But not all of them. The remaining orderings are appropriately set in #20

faern / oneshot

Relax atomic memory ordering constraints #10