Need to better define when resources held by operation-states are cleaned up

lewissbaker commented 4 months ago

When an operation completes and invokes the receiver's completion-handler methods it passes the result-datums as arguments. The arguments are typically passed by-reference and can refer to automatic-storage duration variables in the caller, members of the operation-state or from some other storage.

As the parameters can sometimes refer to members of the operation-state, algorithms generally don't destroy resources held in the operation-state before invoking the receiver.

For example, then() does not destroy the func before invoking the receiver with the result of calling func, meaning that any captures held by func remain alive for the duration of some unspecified number of continuations that execute until something down-stream ends up destroying the operation-state.

This can result in some unexpected lifetimes depending on your mental model.

To give an analogy from how lifetimes behave in normal functions, consider:

struct X { X(A a); void method(); };

X f() {
  A a;
  return X{a};
}

void g() {
  g().method();
  //...
}

With this function, when f() returns it constructs the X result in storage owned by g(), then destroys 'a' and then transfers execution to g() which then goes on to call X::method().

If we were to try to represent this using sender-algorithms we might write something like:

sender auto f() {
  return let_value(just(A{}),
                   [](A& a) { return then(just(), [&] { return X{a}; }); });
}

sender auto g() {
  return then(f(), [](X&& x) { std::move(x).method(); });
}

But with this, the lifetime nesting is not so nicely nested.

When g() is connected and started it will:

construct A in the op-state of let_value (moving the value from the just op-state)
destroys the op-state of the just(A{}) - destroying the moved-from A value
invoke the lambda in f() which constructs X using a and returns it
passes the returned X object to f()'s continuation which then invokes method() on it
at some point later, depending on the consumer of g(), we eventually destroy g's op-state, which destroy's f's let_value op-state which destroys the then-op-state which destroys the lambda and the object A.
at some other point later, the synchronous execution of continuations will return and the stack can be unwound, destroying objects stored on the stack.

While senders ensure that the concurrency is structured, the lifetime of various objects involved here is in some ways unstructured as the lifetime of resources used for one operation can bleed in varying ways into the lifetime of the next operation.

One way to think of the current lifetime behaviour of many sender algorithms is that maybe some of them are like expressions - where the lifetime of temporaries extends to the end of the full-expression. And that some other algorithms introduce constructs that limit the scope of the full-expression and force resources (i.e. operation-states) to be cleaned up.

We should try to come up with a conceptual model for object lifetime within sender compositions - something equivalent to expressions, statements and full-expressions in the C++ language - that we can use to reason about object lifetimes more easily in sender/receiver programs. We should then apply this model to existing algorithms we define to ensure that the semantics with regards to object lifetimes are clear and make sense.

One potential avenue for exploration is to add some extra algorithms that allow managing the lifetime of resources held in operation-states.

statement(src) might ensure that the operation-state of src is destroyed before the receiver connected to it is invoked with the result.
- This may require copying/moving results to the stack in case the src operation completed with references to objects stored in the operation-state.
- This may also require means to disambiguate between whether the operation completed with a reference or completed with a prvalue as this may determine whether or not an algorithm like statement() would need to move/copy the value to the stack.
statement_block(src...) - like a compound-statement in the language. Executes each of 'src' sequentially, ensuring that operation-states are destroyed before executing the next operation.

lewissbaker commented 4 months ago

An example where the lifetimes might cause an issue was:

counting_scope scope;
sync_wait(when_all(
  stop_when(just(), scope.nest(just())),
  scope.join()));

If stop_when connects the nest operation-state but does not start it because the firsts sender completes synchronously but also does not destroy the operation-state then this program deadlocks because join() doesn't complete until the nest op-state is destroyed and when_all doesn't destroy the child op states until they have all finished.

If we added a statement algorithm, the deadlock could be fixed by rewriting as

counting_scope scope;
sync_wait(when_all(
  statement(stop_when(just(), scope.nest(just()))),
  scope.join()));

as this would force destruction of the stop_when op-state and thus the nest op-state when the stop_when operation completes, unblocking the join() operation.

Another option would be to have when_all() destroy its child op-states as each child completes. Or alternatively to have stop_when() destroy the second op-state if it doesn't start it.

msimberg commented 4 months ago

FWIW, there are some more motivating examples in https://github.com/NVIDIA/stdexec/issues/1076. pika implements a drop_operation_state sender adaptor (https://github.com/pika-org/pika/blob/68a61440c39759d5a71be9e31cf8659b48ec5158/libs/pika/execution/include/pika/execution/algorithms/drop_operation_state.hpp) which I think is more or less the statement algorithm you've described above.

lewissbaker commented 2 months ago

This may also require means to disambiguate between whether the operation completed with a reference or completed with a prvalue as this may determine whether or not an algorithm like statement() would need to move/copy the value to the stack.

One potential solution for allowing disambiguation between operations that complete with xvalues vs prvalues could be to adopt the suggestion in #264.

cplusplus / sender-receiver

Need to better define when resources held by operation-states are cleaned up #239