Open lewissbaker opened 1 month ago
As a data-point, in the following snippet in the stdexec implementation, the X
move-constructor is called 5 times!
#include <stdexec/execution.hpp>
#include <cstdio>
struct X {
X() noexcept { std::puts("X::X()"); }
X(X&&) noexcept { std::puts("X::X(X&&)"); }
~X() noexcept { std::puts("X::~X()"); }
};
int main() {
auto result = stdexec::sync_wait(stdexec::just(X{}));
std::puts("done");
}
X()
is moved into a temporary tuple{values...}
inside just()
just
sender as the statestdexec::apply_sender(default_domain{}, sender, env)
default_domain::apply_sender(sender, env)
sync_wait_t::apply_sender(sender, env)
stdexec::connect()
to produce an op-statestdexec::transform_sender(default_domain(), sender, env)
default_domain::transform_sender(sender, env)
just
sender move-constructed from the originalstdexec::connect
calls .connect()
on this new just
sender
which moves the value into the operation-statesync_wait()
calls start()
which invokes the sync_wait receiver's set_value
which moves the result into an optional
object, which is then returned from sync_wait()
.We could reduce this to 4 moves by fixing the implementation of transform_sender()
to not return copies in the case that it is a no-op (bug filed at https://github.com/NVIDIA/stdexec/issues/1329).
We could reduce this to 2 moves if we can support constructing the just
sender by aggregate initialization instead of having to move the value into the tuple
and then move the tuple
into the just
sender.
When building a sender expression, the evaluation of the expression first builds the leaf senders, then passes these as arguments to higher-level senders, which needs to move-construct them into data-members of itself, which might then be passed to another sender algorithm which then needs to move-construct that whole sub-tree of senders into its data-member, etc.
In the end, a leaf operation (and any state it holds) will be move-constructed O(depth) times when incorporated into a sender-tree of a given depth. As sender-trees typically get wider as they go deeper, the number of move-operations needed to build the final tree can be quadratic as the tree is built.
Further, if the tree is built as a single expression, then all of the temporary intermediate senders still exist until the end of the full-expression and so the amount of stack space needed to create a large expression tree can potentially grow quadratically in the size of the expression tree.
For example: consider a binary tree of
when_all()
operations where leaves arejust(X{})
:just
sendersin final tree
when_all
sendersin final tree
X
temporaries/movesIf this sender-expression was then passed to a
co_await
orsync_wait
expression, then this results in the operation-state objects being constructed, which then generally moves the state from the final sender tree into an operation-state tree.However, with the introduction of
transform_sender()
-based customization, this can result in moving the entire tree again (thedefault_domain::transform_sender()
function returns a new prvalue sender rather than just returning the input reference.Further, if the pipe syntax is used, then the number of moves of each leaf argument in a sender tree generally increases by 1 as you first need to move the object into a temporary adaptor object, before then moving it into the initial leaf sender.
We should see if we can reduce the number of intermediate objects required for building large trees, if possible, by directly constructing senders using aggregate initialization.