Open lewissbaker opened 4 months ago
A paper along these lines would also need to consider what to do when that scheduler that completion of the operation is scheduled onto is a fallible scheduler and what to do if it actually fails to schedule onto that context.
@ispeters Mentioned that the async_scope paper already has the suggested behaviour for the join()
operation on an async-scope.
One thing I'm not sure about is how to define a policy/design-guideline for subsequent facilities that are proposed to ensure that they all follow this design. e.g. if we add an async_mutex, async_semaphore, or other things with a 'wait' operation.
We could encode this guideline in a paper and then just ensure that SG1 understands this and requests that similar looking things that come past SG1 refer to that paper and apply it to the new facilities. I don't think there's anything actionable for things added by P2300, though.
@ispeters Mentioned that the async_scope paper already has the suggested behaviour for the
join()
operation on an async-scope.
Here: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3149r6.html#simple_counting_scopejoin
I think we need to have a paper discussing the potential approaches to completion context for certain kinds of senders and their relative trade-offs.
There is a tension here between minimising overhead (allowing inline completion), separation of concerns (not requiring leaf operations to all have to deal with schedulers) and safety (running continuations on unexpected contexts inside innocent looking calls).
@robertleahy Would like to provide some input on this.
There are several data-structures that have async operations where a consumer that waits on something is un-blocked by some other operation on the data-structure.
For example:
bounded_queue
from P0260 has anasync_pop()
that returns a sender. If the queue was empty at the time the operation was started then it would be completed by the next call totry_push()
,push()
orasync_push()
. However, completing theasync_pop()
operation inline inside the call topush()
could then potentially run an arbitrary amount and type of code inside the call topush()
. Ideally, the call topush()
would just enqueue the completion ofasync_pop()
to a scheduler and then return immediately.async_scope
from P3149 has ajoin()
operation on thecounting_scope
object, but thejoin()
operation generally completes as a side-effect of an operation-nested within the scope completing. e.g. triggered from the destructor of a future-sender or nest-operation-state. If thejoin()
operation completes inline inside the destructor then it might call continuation code that runs an arbitrary amount of code inside the context of that destructor, delaying the execution of the continuation attached to the future-sender operation. Ideally, decrementing the last ref-count would just trigger scheduling an operation on to some scheduler that then invoked the completion-handler for thejoin()
operation.These are just two examples of this kind of "something happens that triggers completion of some waiting operation" situation - this situation will recur on almost all data-structures that have a "wait until something happens" async operation.
We need to come up with a strategy/pattern that we can apply to such data-structures to ensure that they have consistent behaviour that doesn't have the inline-completion footguns.
Several options to consider:
folly::coro::viaIfAsync()
.completes_on()
to the sender to force the completion to reschedule onto a provided schedulerNote that this may tie in with the
task
design - thetask
coroutine type may implicitly apply such an algorithm to allco_await
expressions within the coroutine.