bheisler / RustaCUDA

Rusty wrapper for the CUDA Driver API
Apache License 2.0
765 stars 58 forks source link

documentation: sync copy is using the default stream #59

Open vmx opened 3 years ago

vmx commented 3 years ago

I'm new to CUDA and I ran into a race condition which could perhaps be prevented with changes to the documentation.

The problem

Mixing the default stream and a custom stream isn't a good idea.

The implementations of the CopyDestination trait are implicitly using the default stream. When you launch a kernel on a stream that was created with the NON_BLOCKING flag, this can lead to a race condition.

My confusion

The documentation of the NON_BLOCKING stream flag has a good explanation about the default (NULL) stream. Though the sentence:

Since RustaCUDA does not provide access to the NULL stream, this flag has no effect in most circumstances. However, it is recommended to use it anyway, as some other crate in this binary may be using the NULL stream directly.

made me believe that as long as I use RustaCUDA, I should enable NON_BLOCKING and everything will be fine. The default stream is not used within the library, which is not true as mentioned above.

For me there were two solutions:

  1. Either not setting the NON_BLOCKING stream flag, this way even if I launch the kernel on a custom stream (there is currently no way in RustaCUDA to launch it on the default stream), things would properly be synchronized.
  2. I use the async copy methods on the same stream I launch the kernel on and synchronize the stream right after the copy operation (that's what I did).

Proposed fix

I propose adding a warning/info to the NON_BLOCKING stream flag documentation, that states that the synchronuous copy versions use the default stream and this setting might have an impact. Additionally I'd add information about the default stream to the CopyDestination trait itself.