Define a policy for resuming leaf operations that avoids running continuations inline on context that triggered completion

lewissbaker commented 4 months ago

There are several data-structures that have async operations where a consumer that waits on something is un-blocked by some other operation on the data-structure.

For example:

bounded_queue from P0260 has an async_pop() that returns a sender. If the queue was empty at the time the operation was started then it would be completed by the next call to try_push(), push() or async_push(). However, completing the async_pop() operation inline inside the call to push() could then potentially run an arbitrary amount and type of code inside the call to push(). Ideally, the call to push() would just enqueue the completion of async_pop() to a scheduler and then return immediately.
async_scope from P3149 has a join() operation on the counting_scope object, but the join() operation generally completes as a side-effect of an operation-nested within the scope completing. e.g. triggered from the destructor of a future-sender or nest-operation-state. If the join() operation completes inline inside the destructor then it might call continuation code that runs an arbitrary amount of code inside the context of that destructor, delaying the execution of the continuation attached to the future-sender operation. Ideally, decrementing the last ref-count would just trigger scheduling an operation on to some scheduler that then invoked the completion-handler for the join() operation.

These are just two examples of this kind of "something happens that triggers completion of some waiting operation" situation - this situation will recur on almost all data-structures that have a "wait until something happens" async operation.

We need to come up with a strategy/pattern that we can apply to such data-structures to ensure that they have consistent behaviour that doesn't have the inline-completion footguns.

Several options to consider:

require that the receiver connected to the waiting-op-sender has a scheduler and either have it complete inline or on the associated scheduler
introduce a new algorithm that can be adapted over each leaf operation that has the operation either complete synchronously on the start() context, or otherwise schedules completion on a specified scheduler - similar to folly::coro::viaIfAsync().
Just leave it as is and require users to manually apply completes_on() to the sender to force the completion to reschedule onto a provided scheduler

Note that this may tie in with the task design - the task coroutine type may implicitly apply such an algorithm to all co_await expressions within the coroutine.

lewissbaker commented 1 week ago

A paper along these lines would also need to consider what to do when that scheduler that completion of the operation is scheduled onto is a fallible scheduler and what to do if it actually fails to schedule onto that context.

lewissbaker commented 1 week ago

@ispeters Mentioned that the async_scope paper already has the suggested behaviour for the join() operation on an async-scope.

lewissbaker commented 1 week ago

One thing I'm not sure about is how to define a policy/design-guideline for subsequent facilities that are proposed to ensure that they all follow this design. e.g. if we add an async_mutex, async_semaphore, or other things with a 'wait' operation.

We could encode this guideline in a paper and then just ensure that SG1 understands this and requests that similar looking things that come past SG1 refer to that paper and apply it to the new facilities. I don't think there's anything actionable for things added by P2300, though.

ispeters commented 1 week ago

@ispeters Mentioned that the async_scope paper already has the suggested behaviour for the join() operation on an async-scope.

Here: https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2024/p3149r6.html#simple_counting_scopejoin

lewissbaker commented 5 days ago

I think we need to have a paper discussing the potential approaches to completion context for certain kinds of senders and their relative trade-offs.

There is a tension here between minimising overhead (allowing inline completion), separation of concerns (not requiring leaf operations to all have to deal with schedulers) and safety (running continuations on unexpected contexts inside innocent looking calls).

@robertleahy Would like to provide some input on this.

cplusplus / sender-receiver

Define a policy for resuming leaf operations that avoids running continuations inline on context that triggered completion #269