Rust-GPU / Rust-CUDA

Ecosystem of libraries and tools for writing and executing fast GPU code fully in Rust.
Apache License 2.0
3.06k stars 119 forks source link

Enable code for dynamic parallelism #96

Open thedodd opened 1 year ago

thedodd commented 1 year ago
thedodd commented 1 year ago

So, interestingly, I'm running into an issue where the generated code can not be loaded by Module::from_ptx. It will return error a PTX JIT compilation failed.

Some background on current testing:

Now, what is quite strange is that if I copy the PTX from the working C++ program over to the Rust program (disabling PTX gen in the Rust program to ensure the C++ PTX is not overwritten), the Rust program aborts with that same error a PTX JIT compilation failed.

So, I am wondering:

thedodd commented 1 year ago

Perhaps we need to be manually constructing a linker, linking the PTX and the cudadevrt.lib, then compiling to a cubin and such. Will try that.

thedodd commented 1 year ago

Yea, that was it. Need to create a linker, add the PTX, add libcudadevrt (right now I have this hard-coded, but I need to create a dynamic search mechanism, as I don't think the cuda linker will do this on its own ... we'll see).

From there, I was able to successfully execute the PTX from the sample C++ app of mine. The generated Rust PTX has an invalid memory access taking place, and it looks like it is coming from how the buffer is being populated. This is still a step forward, as the code gen is much easier to fix. I at least know what I'm dealing with, instead of some opaque "JIT compilation failed" error.

thedodd commented 1 year ago

Yea, that did it. Code gen is far from optimal for loading the param buffer. But it works, and I am able to successfully use dynamic parallelism from the Rust generated PTX end to end. Expected output and behavior.

Macro codegen for populating the buffer can be optimized further, as the generated PTX is not optimal. I'll focus on that later.