Avoiding inline execution in scoped algorithms

ericniebler commented 10 months ago

Issue by LeeHowes Tuesday Nov 30, 2021 at 01:21 GMT Originally opened as https://github.com/NVIDIA/stdexec/issues/289

We have currently defined the algorithms to call-through on completion. This means that a set_value call to then(f)'s receiver will call f inline and then call set_value on the next receiver in the chain. In simple cases like then this works fine, in structured cases there is danger here.

Given:

some_sender() | let_value([](){return some_complex_work_sender_that_does_Somethign_on_a_random_scheduler()}) | bulk(complex stuff)

some_sender will complete, say on context c1. However, let_value triggers async work. This may run on a thread pool that is completely unpredictable given the written code. When that async scheduler completes, it is that that bulk will now run on.

In principle that is fine - bulk should customise on it or fail to do so - but reading that code it is surprising. It is also a potential source of very serious bugs. The reason we put a lot of effort into SemiFuture at Facebook was to disallow attaching continuations to work without calling .via(executor) first, thereby enforcing a transition. Similarly folly::coro mandates that co_await transition back such that the seemingly equivalent coroutine code:

co_await some_sender();
co_await some_complex_work_sender_that_does_Somethign_on_a_random_scheduler();
co_await bulk(complex stuff);

Always explicitly resumes on the scheduler associated with the coroutine. This is an important safety feature to avoid the async work causing a lot of chained work to execute on that remote agent - an agent that is not visible to the author of the code, with unknown properties.

The simplest solution to this would be to enforce that structured constructs that transition context should resume on a known context:

let would resume on either the context it started on, ie the completion_scheduler of the caller, or, preferably, the scheduler provided by get_scheduler on the passed receiver.
on(s2, on(s, f)) would start() f on s but would transition back to s2 on return.

This would at least cover the default algorithms, making default uses safer. I believe it would also better match the reader's expectation of where code runs. Transitions are still explicit in the sense that if we add a scoped construct we enforce a transition.

In general though, I would like us to find a way to be much more concrete throughout the model about precisely where work is running and when.

ericniebler commented 10 months ago

Comment by LeeHowes Tuesday Nov 30, 2021 at 01:27 GMT

This also makes the completion scheduler available for let_ without waiting for the callable to return a value to call get_completion_scheduler on.

ericniebler commented 10 months ago

Comment by LeeHowes Tuesday Nov 30, 2021 at 01:33 GMT

Following the folly model, a way to separate concerns would be to have a known wrapper type that is a semi-scheduler such that a just-like API might be typed as:

template<class T> auto just(T t) -> semi_sender<decltype(just(T))>;

where that wrapper type requires a scheduler-provider to be passed to it and that transitions to the provided scheduler on completion. That would be enforceable at code review time. I don't know if we could cleanly define a variant of let_ that returned a semi_sender in this way. It would be a little painful to lint that all external APIs return a semi_sender to sanitize them. It would be doable, but smells strongly of working around a hazard in the defaults.

ericniebler commented 10 months ago

Comment by miscco Tuesday Nov 30, 2021 at 07:02 GMT

I am confused, My understanding was that we have execution::on for one direction and execution::transfer for the other direction.

Shouldnt that always be explicitly required

ericniebler commented 10 months ago

Comment by LeeHowes Tuesday Nov 30, 2021 at 17:47 GMT

It can be explicitly required. Explicitly requiring it has proved to be a huge source of dangerous runtime deadlocks in practice, though. This is why our coroutine code is very careful to always resume on a known executor - leaf executors should never leak out to their callers. Always requiring insertion of via anywhere: a) will allow those inner executors to leak out. It's extremely hard to validate the code against that happening. b) will lead to problems like computationally intensive code running on a network thread, blocking all network traffic from being processed and potentially in extreme cases deadlocking the system. c) even when applied well, leads to very invasive passing of executors. If a calls b calls c calls d which needs to add a call to a network library, a, b, c and d all need to be modified to add an executor parameter so that d can call via.

Now, we can work around this by enforcing really strict rules, but most likely to do that we'd have to strongly advise against using the standard algorithms and replace them with versions that more strictly scope execution contexts.

Remember, I'm working from the POV of helping thousands of developers build large network services, many of whom are not experts. Knowing to think hard about what context work is going to complete on is a hard message. What we see in practice in folly code where we had this behaviour, before we started to push people to SemiFuture and coroutines, that looks like:

foo().via(e).then(f).via(e).then(f2).via(e).then(f3).via(e).

People just don't know when they should call via so they fall back to doing it everywhere. In coroutine code we don't see that because we give people simple rules about scoping of execution contexts and enforce it in the co_await calls themselves.

lewissbaker commented 3 months ago

This seems related to #269.

cplusplus / sender-receiver

Avoiding inline execution in scoped algorithms #111