bheisler / RustaCUDA

Rusty wrapper for the CUDA Driver API
Apache License 2.0
765 stars 58 forks source link

Synchronisation issues for multi-dimensional launch calls #44

Closed ChipsSpectre closed 4 years ago

ChipsSpectre commented 4 years ago

Hey,

I implemented a simple kernel (just kopies each pixel of an image) and issues in the lower part of the image: grafik

The black stripes at the bottom of the image are different on each call, but always get larger from top to bottom. Therefore I assume that stream.synchronise()?; has an issue for multi-dimensional kernel launches like this:


launch!(module.conv2d<<<(20, 20, 1), (32, 24, 1), 0, stream>>>(
...
)?;

(note: the image size is 640x480 pixels)

How can the synchronisation issue be solved? Should I restrict my kernels to 1-Dimensional block and thread dimensions?

launch!(module.conv2d<<<640, 480, 0, stream>>>( ... )?;

elimininates the issue.

bheisler commented 4 years ago

Hey, thanks for trying RustaCUDA.

I don't think there is such a synchronization issue. RustaCUDA isn't really doing anything inside the launch macro that would require extra synchronization, just launching the kernel with cuLaunchKernel. You can check this for yourself if you'd like; the code for the macro is in function.rs which calls the launch function in stream.rs. I suppose it's possible there's a bug in CUDA, but I feel like a bug in such basic functionality is unlikely to have gone unnoticed.

It's more likely that there's a bug in your kernel code which happens to be hidden by changing the grid and block dimensions.