Closed m1710545173 closed 2 years ago
Hello and thank you for your interest. It really depends on what you mean by debugging.
setup.py
better as the logs are a bit more concise.allclose
, and then do the same for the backward pass.If you're getting into optimization, you may find pytorch's profiler useful for measuring latency, and you'd probably need NVIDIA Nsight to profile in more detail.
I hope you find these useful, but if you need more details, please let us know.
Thank you very much for your suggestions! I will have a try.
Closing this due to inactivity. If you still have questions feel free to open it back up.
Hello author, I want to change nattenqkrpb_cuda_forward_kernel to achieve the desired functions, but I don't know much about CUDA Programming and I don't know how to debug CUDA kernel. The programming tools i am using is the visual studio and the libtorch on Windows 10. Although i can debug some part of .cu file, i can't debug the cuda kernel. So, i want to know what tools and methods do you use to debug cuda programming? Please give me some suggestions!