Open DejvBayer opened 6 months ago
Hello,
multiple streams was a test to mimic the Vulkan behavior of shader dispatches to the pipeline, where unless synchronized they launch without waiting for completion of the last shader - unlike the kernel model of CUDA, where kernels wait for previous kernels. The usability of it turned out to be very limited - only if there are multiple dispatches of kernel when the grid dimensions go out device limits (65k for y and z). However, these workloads are typically big and utilize GPU fully by themselves with low CPU overhead, so using multiple streams was not useful at all. I think you are correct that the synchronization is messed up currently for this version, I will need to check in detail your changes when I have more time.
Best regards, Dmitrii
Sure, the mechanism is described here:
It is just extended to work with arbitrary number of streams.
David
Hi,
this is a snippet of launch of a CUDA kernel from
DispatchPlan
module.I do not understand several things about this code:
RunApp
module you callVkFFTSync
after each kernel launch. I think that it is not necessary unless you want to execut the work in parallel.Then here is a snippet from a
VkFFTSync
function.Here is the synchronization of multiple CUDA streams. If I am not wrong, the it synchronizes events that were never launched into a stream. Also it makes the application synchronous, I guess that
cudaStreamWaitEvent
function would be more suitable in this case.But overall I feel like that the whole design of using multiple streams is wrong. What I think is right would be:
VkFFTAppend
function is called this should happen:cudaEventRecord
.cudaStreamWaitEvent
on each except the first event.cudaEventRecord
cudaStreamWaitEvent
on the first event.This attitude should work fine and even allow the usage of CUDA Graphs via stream capture. HIP has the exact same story.
Thanks!
David