It would be really nice to add NVTX support, so that users can visualize computation graphs created with senders in NSight Systems. Otherwise, the granularity is that of sync_wait, and that is too coarse to reason about performance. For example:
auto t0 = exec::on(s) | nvexec::nvtx::scoped("kernel0", ex::bulk(r0, k0));
auto t1 = exec::on(s) | nvexec::nvtx::scoped("kernel1", ex::bulk(r1, k1));
auto w = stdexec::when_all(t0, t1);
stdexec::sync_wait(std::move(w));
It would be really nice to add NVTX support, so that users can visualize computation graphs created with senders in NSight Systems. Otherwise, the granularity is that of
sync_wait
, and that is too coarse to reason about performance. For example: