Open Tamschi opened 1 week ago
It seems there's no existing unbounded SPSC I can use for this. I should be able to chain ringbuf
FIFOs, though, by storing them in a linked list where the producer holds a reference to the first and the consumer to the last.
store
the new node in the previous next
with Release
. (Ensure that this can't happen after the ringbuf
producer is dropped, likely with a barrier in the list node's Drop::drop
.)Provide a way to add a new small buffer to trim the overall allocation?
Is your feature request related to a problem? Please describe. Queueing updates in
GlobalSignalsRuntime
currently takes out a global exclusive lock, which is mildly put less than ideal. This action can be made lock-free and barrier-free between concurrent writes.This reduces read/write synchronisation essentially to multiple SPSC channels which can likely use
compare_exchange
withRelaxed
failure
ordering due to an unbroken series of transaction IDs (see below).That still leaves congestion of the atomic (see below), but considering that shouldn't be called in a tight loop, it might be irrelevant in that the atomic is likely to be committed to a shared cache naturally before it is accessed again.
Describe the solution you'd like
when an update is submitted
Relaxed
(?)fetch_add
to produce a transaction index. No wrapping logic is necessary – even if the counter is increased each nanosecond, it would take close to 600 years for anu64
to cap out.Release
barrier specific to that storage compartment.when updates are consumed
u64
is used to lock and allocate consumption. (u64::MAX
acts as lock which breaks out of the loop when encountered.) Without changing semantics, the frequency of this check can be reduced based on when the outermost call on a thread's context stack exits, only then trying to perform updates.Acquire
swap
. If the number is equal to the submission counter, undo the lock and exit.Describe alternatives you've considered Is this just an MPSC that respects outside memory barriers and doesn't introduce them itself? It may exist already, and if not it likely makes for a useful library. Maybe crossbeam-channel or an explicit SPSC channel library could help (or be used for the thread-associated storage if it allows peeking).
Additional context This is inspired by https://snarfed.org/2019-12-28_39697. Thanks to @snarfed for pointing me in this direction, that was spot-on. I can't quite use transactional memory unfortunately, but my submitted-updates API can still benefit from a very similar implementation.
I'll label this as both work: clear and work: complicated since the approach is straightforward in theory but packaging it nicely may not be. There may also be an existing implementation, which going to need more research to rule out.