Open benbarsdell opened 3 years ago
Doesn't cudaLaunchKernel
have the exact same problem? It can return error codes from previous async calls:
Note that this function may also return error codes from previous, asynchronous launches.
I think the difference is that it only returns sticky error codes from prior launches, but harmless error codes are not returned. E.g, an out of bounds write would flag illegal address and that would return in a subsequent cudaLaunchKernel
, but a failed kernel launch due too many resources being used would not.
I'm tentatively slating this for 1.14. I'm in the middle of rewriting the kernel dispatch mechanisms Thrust and CUB, and eventually we'll port the triple_chevron_launcher
class from Thrust to CUB, which already has similar code to repack the arguments for cudaLaunchDevice
CDP launches. We may be able to reuse that logic here.
It was just pointed out in NVBug 200715408 that this will also WAR some issues related to templated kernels and shared libraries that result in corrupted cudart state. Bumping up the priority.
The main reason for this request is to improve error handling. When using <<<>>>, CUB currently has to call cudaPeekAtLastError after the launch to check for invalid configuration errors. However, this API also returns invalid configuration errors from previous launches. If cudaLaunchKernel is used instead then its return value can be checked directly and it is unaffected by previous invalid configuration errors.
There is also a small performance benefit to using cudaLaunchKernel.
The only downside is that cudaLaunchKernel cannot perform template type deduction or implicit argument conversions. However, type safety can be achieved using a wrapper like this: