CUDA runtime api - Githubissues

psmyth94 commented 1 week ago

Hello,

I started working on the implementation for the cuda runtime api since I saw some interest in it (#200). I managed to translate the cuda runtime api equivalent for most of the functions in the driver api except for context and module management, which is handled automatically by cudart. Below are some of the issues/limitations:

No cudaOccupancyMaxPotentialBlockSize.*
- bindgen isn't able to generate templated static inline functions [see their unsuppoted-features]
Cannot use cudaLaunchKernel via FFI
- Using cudaLaunchKernel via FFI bindings in Rust isn't possible AFAIK because CUDA runtime expects a specific binary layout to find and launch compiled kernels. Rust's FFI mechanism doesn't natively conform to this layout, preventing the CUDA runtime from resolving and executing kernel functions properly.
- The only way to use the runtime api is by calling precompiled wrappers that use the CUDA <<<...>>> syntax, which what I did with testkernel.cu.

I only have bindings for cuda-12050 and 12020 so far. I want to gauge community interest in this implementation before investing further time.

ya0guang commented 5 days ago

Hi Patrick, I'm interested in your proposal! I'm trying to add support for cuda 12.4 but I don't understand how to generate the source file like sys_12040.rs using the bindgen.sh script. I see a line here: CUDART_VERSION=$(cat tmp.rs | grep "CUDART_VERSION" | awk '{ print $6 }' | sed 's/.$//'). What is expected from tmp.rs?

Thanks for your effort for porting CUDA runtime!

ya0guang commented 5 days ago

Sorry there is some problem with my rustfmt. the bindgen script works on my side now with CUDA 12.4 and Ubuntu 22.04. All tests from the runtime pass. I'll dive deeper into it, thanks!

coreylowman commented 2 days ago

Cannot use cudaLaunchKernel via FFI

Yeah this was the main problem that I ran into as well. I think with rust we always would have to rely on the driver api to call kernels. This is why at this point I just chose to go with driver api.

Do we gain anything with adding runtime api?

At the very least we should document this shortcoming

ya0guang commented 2 days ago

My understanding is runtime API works at a higher level and is easier to deal with for developers. From CUDA developer guide:

The runtime API eases device code management by providing implicit initialization, context management, and module management. This leads to simpler code, but it also lacks the level of control that the driver API has.

I believe an alternative way to do cudaLaunchKernel is to launch kernel using cuLaunchKernel, as it cannot be directly implemented via this runtime API call. Another project for CUDA API remoting, cricket, adopted this way: https://github.com/RWTH-ACS/cricket/blob/ce8fdf7d4f4df696cf65c6d35926a76443d18f28/cpu/cpu-server-runtime.c#L875

coreylowman / cudarc

CUDA runtime api #262