Open ericniebler opened 10 months ago
Comment by LeeHowes
Tuesday Nov 30, 2021 at 01:27 GMT
This also makes the completion scheduler available for let_
without waiting for the callable to return a value to call get_completion_scheduler
on.
Comment by LeeHowes
Tuesday Nov 30, 2021 at 01:33 GMT
Following the folly model, a way to separate concerns would be to have a known wrapper type that is a semi-scheduler such that a just-like API might be typed as:
template<class T> auto just(T t) -> semi_sender<decltype(just(T))>;
where that wrapper type requires a scheduler-provider to be passed to it and that transitions to the provided scheduler on completion. That would be enforceable at code review time. I don't know if we could cleanly define a variant of let_
that returned a semi_sender
in this way. It would be a little painful to lint that all external APIs return a semi_sender
to sanitize them. It would be doable, but smells strongly of working around a hazard in the defaults.
Comment by miscco
Tuesday Nov 30, 2021 at 07:02 GMT
I am confused, My understanding was that we have execution::on
for one direction and execution::transfer
for the other direction.
Shouldnt that always be explicitly required
Comment by LeeHowes
Tuesday Nov 30, 2021 at 17:47 GMT
It can be explicitly required. Explicitly requiring it has proved to be a huge source of dangerous runtime deadlocks in practice, though. This is why our coroutine code is very careful to always resume on a known executor - leaf executors should never leak out to their callers. Always requiring insertion of via anywhere: a) will allow those inner executors to leak out. It's extremely hard to validate the code against that happening. b) will lead to problems like computationally intensive code running on a network thread, blocking all network traffic from being processed and potentially in extreme cases deadlocking the system. c) even when applied well, leads to very invasive passing of executors. If a calls b calls c calls d which needs to add a call to a network library, a, b, c and d all need to be modified to add an executor parameter so that d can call via.
Now, we can work around this by enforcing really strict rules, but most likely to do that we'd have to strongly advise against using the standard algorithms and replace them with versions that more strictly scope execution contexts.
Remember, I'm working from the POV of helping thousands of developers build large network services, many of whom are not experts. Knowing to think hard about what context work is going to complete on is a hard message. What we see in practice in folly code where we had this behaviour, before we started to push people to SemiFuture and coroutines, that looks like:
foo().via(e).then(f).via(e).then(f2).via(e).then(f3).via(e).
People just don't know when they should call via
so they fall back to doing it everywhere. In coroutine code we don't see that because we give people simple rules about scoping of execution contexts and enforce it in the co_await
calls themselves.
This seems related to #269.
We have currently defined the algorithms to call-through on completion. This means that a
set_value
call tothen(f)
's receiver will call f inline and then callset_value
on the next receiver in the chain. In simple cases likethen
this works fine, in structured cases there is danger here.Given:
some_sender
will complete, say on contextc1
. However,let_value
triggers async work. This may run on a thread pool that is completely unpredictable given the written code. When that async scheduler completes, it is that thatbulk
will now run on.In principle that is fine - bulk should customise on it or fail to do so - but reading that code it is surprising. It is also a potential source of very serious bugs. The reason we put a lot of effort into SemiFuture at Facebook was to disallow attaching continuations to work without calling
.via(executor)
first, thereby enforcing a transition. Similarlyfolly::coro
mandates thatco_await
transition back such that the seemingly equivalent coroutine code:Always explicitly resumes on the scheduler associated with the coroutine. This is an important safety feature to avoid the async work causing a lot of chained work to execute on that remote agent - an agent that is not visible to the author of the code, with unknown properties.
The simplest solution to this would be to enforce that structured constructs that transition context should resume on a known context:
let
would resume on either the context it started on, ie the completion_scheduler of the caller, or, preferably, the scheduler provided byget_scheduler
on the passed receiver.on(s2, on(s, f))
wouldstart()
f
ons
but would transition back tos2
on return.This would at least cover the default algorithms, making default uses safer. I believe it would also better match the reader's expectation of where code runs. Transitions are still explicit in the sense that if we add a scoped construct we enforce a transition.
In general though, I would like us to find a way to be much more concrete throughout the model about precisely where work is running and when.