Add documentation on how to synchronize non-default streams

Problem statement

As a Developer, when I read https://docs.rs/cudarc/latest/cudarc/driver/safe/struct.CudaStream.html, it is not clear how I can use cudarc to synchronize work across non-default CUDA streams.

Solution

In this PR, I added documentation for how to synchronize many non-default streams.

Context

In my previous PR https://github.com/coreylowman/cudarc/pull/254, I thought that CudaStream streams were thread-safe. My understanding was incorrect.

@coreylowman then kindly pointed that usually one achieves on-device concurrency by launching from the same host thread several CUDA kernels on several non-default non-blocking CUDA streams. So I read the CUDA documentation about streams to determine how to achieve this with cudarc.

cudarc uses the CUDA events API to synchronize work.

Use case

I use cudarc with this approach in my novigrad project (GPU Scheduler, DeviceStream) and it works very well apparently.

coreylowman / cudarc

Add documentation on how to synchronize non-default streams #261