NVIDIA / stdexec

`std::execution`, the proposed C++ framework for asynchronous and parallel programming.
Apache License 2.0
1.56k stars 159 forks source link

NVTX range support #909

Closed gonzalobg closed 1 year ago

gonzalobg commented 1 year ago

It would be really nice to add NVTX support, so that users can visualize computation graphs created with senders in NSight Systems. Otherwise, the granularity is that of sync_wait, and that is too coarse to reason about performance. For example:


auto t0 = exec::on(s) | nvexec::nvtx::scoped("kernel0", ex::bulk(r0, k0));
auto t1 = exec::on(s) | nvexec::nvtx::scoped("kernel1", ex::bulk(r1, k1));

auto w = stdexec::when_all(t0, t1);
stdexec::sync_wait(std::move(w));
jrhemstad commented 1 year ago

For reference: https://github.com/nvidia/nvtx#how-do-i-use-nvtx-in-my-code