I want to launch N kernels of the same type at once. How should I go about this?
Should I create multiple Programs or just one Program but multiple kernels?
Launching them back to back without synchronizing the stream, or launching on separate streams then synchronizing should be fast enough, since the launch is non-blocking.
I want to launch N kernels of the same type at once. How should I go about this? Should I create multiple Programs or just one Program but multiple kernels?