@coreylowman then kindly pointed that usually one achieves on-device concurrency by launching from the same host thread several CUDA kernels on several non-default non-blocking CUDA streams. So I read the CUDA documentation about streams to determine how to achieve this with cudarc.
Problem statement
As a Developer, when I read https://docs.rs/cudarc/latest/cudarc/driver/safe/struct.CudaStream.html, it is not clear how I can use cudarc to synchronize work across non-default CUDA streams.
Solution
In this PR, I added documentation for how to synchronize many non-default streams.
Context
In my previous PR https://github.com/coreylowman/cudarc/pull/254, I thought that CudaStream streams were thread-safe. My understanding was incorrect.
@coreylowman then kindly pointed that usually one achieves on-device concurrency by launching from the same host thread several CUDA kernels on several non-default non-blocking CUDA streams. So I read the CUDA documentation about streams to determine how to achieve this with cudarc.
cudarc uses the CUDA events API to synchronize work.
Use case
I use cudarc with this approach in my novigrad project (GPU Scheduler, DeviceStream) and it works very well apparently.