GPU kernel debugging documentation

dbeckwith commented 2 years ago

Super excited to try out this project, I've just been reading through the Guide so far. I couldn't find any pages on debugging GPU kernels at runtime (there's a page on debugging the codegen but not the live kernel itself). I think it would be great if debugging was mentioned somewhere, at least to say if it isn't well-supported yet or maybe link to some external resources on CUDA debugging.

RDambrosio016 commented 2 years ago

CUDA debug info has been... weird, really weird. This is what i have figured out so far:

NVVM IR has clear docs on what it wants for debug info, i haven't implemented those things in particular but i have implemented debug info LLVM IR generation.
Making full debug info beyond just line tables, but not having those things yields a segfault (during libnvvm compilation).
Making line tables and giving libnvvm -generate-line-info also yields a segfault.
Making line tables but not giving libnvvm -generate-line-info works (it makes debug info in the PTX), but, it causes libnvvm to not optimize the program (what???)

Moreover, cuda-gdb does not work on windows, i tried using cuda's VS debugger but it seems to be CUDA C++ only, i found no way of running it on an arbitrary executable (someone might be able to figure this out).

So overall, its not well supported currently, since debug info is kind of low on the priority list. One thing i need to finish first is DCE (dead code elimination), because debug info generates a glorious amount of PTX code, and it gets really big really quickly.

RDambrosio016 commented 2 years ago

Update, i was able to fix the segfaults during debug info generation, now the only issue is that for some reason kernels are so much slower on debug (but the code is optimized), 600ms vs 30ms for example. For some reason cuda uses 200 regs on debug, but 96 without debug, which is probably what makes it so slow. I will open a forum post/ask someone at nvidia about this

JustinMBrown commented 2 years ago

Just wanted to say thank you for working on the debugger! Whenever I learn anything new related to programming, the debugger is like my safety blanket lol. Because no matter how magical/crazy a piece of code looks, I know that as long as I can step through the code and see the values of my variables change in real-time, I can figure anything out.

thedodd commented 2 years ago

Yea, I currently have a kernel which is exhibiting some serious warp stalling, and I'm not really sure where this is in the actual kernel source.

Any tips/tricks folks might recommend to try to pin this down pre-debugger support?

UPDATE: Disregard, I found the issue by way of deduction. Some foolish memory access patterns.

Rust-GPU / Rust-CUDA

GPU kernel debugging documentation #6