SHI-Labs / Neighborhood-Attention-Transformer

Neighborhood Attention Transformer, arxiv 2022 / CVPR 2023. Dilated Neighborhood Attention Transformer, arxiv 2022
MIT License
1.04k stars 85 forks source link

how to debug cuda kernel? #42

Closed m1710545173 closed 2 years ago

m1710545173 commented 2 years ago

Hello author, I want to change nattenqkrpb_cuda_forward_kernel to achieve the desired functions, but I don't know much about CUDA Programming and I don't know how to debug CUDA kernel. The programming tools i am using is the visual studio and the libtorch on Windows 10. Although i can debug some part of .cu file, i can't debug the cuda kernel. So, i want to know what tools and methods do you use to debug cuda programming? Please give me some suggestions!

alihassanijr commented 2 years ago

Hello and thank you for your interest. It really depends on what you mean by debugging.

If you're getting into optimization, you may find pytorch's profiler useful for measuring latency, and you'd probably need NVIDIA Nsight to profile in more detail.

I hope you find these useful, but if you need more details, please let us know.

m1710545173 commented 2 years ago

Thank you very much for your suggestions! I will have a try.

alihassanijr commented 2 years ago

Closing this due to inactivity. If you still have questions feel free to open it back up.