NVIDIA / nsight-vscode-edition

A Visual Studio Code extension for building and debugging CUDA applications.
Other
68 stars 11 forks source link

Debugging not working. Failed to suspend device for CUDA device. #24

Closed sg879 closed 1 year ago

sg879 commented 1 year ago

I am trying to set up the debugging capabilities with the matrixMul CUDA sample.

Environment: WSL GPU; NVIDIA GeForce RTX 3080 Laptop CUDA: 12.0 NVIDIA Driver: 528.02

I can build and run the CUDA program without an issue. However, upon the attempting to debug the program, I get the following error:

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 1883]
[New Thread 0x7fffef75e000 (LWP 1886)]
Error: Failed to suspend device for CUDA device 0, error=CUDBG_ERROR_INTERNAL(0xa).
salykova commented 1 year ago

Enviroment: WSL GPU: RTX 3070 Laptop CUDA RT: 12 CUDA Toolkit: 11.7 NVIDIA driver: 528.02

Same problem. Did you find any solution?

salykova commented 1 year ago

I am trying to set up the debugging capabilities with the matrixMul CUDA sample.

Environment: WSL GPU; NVIDIA GeForce RTX 3080 Laptop CUDA: 12.0 NVIDIA Driver: 528.02

I can build and run the CUDA program without an issue. However, upon the attempting to debug the program, I get the following error:

Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 1883]
[New Thread 0x7fffef75e000 (LWP 1886)]
Error: Failed to suspend device for CUDA device 0, error=CUDBG_ERROR_INTERNAL(0xa).

Hi @sg879 I've just tried with 527.41 driver + cuda runtime 12 + cuda driver 11.7 and it works perfectly. Try with this version of the driver. Also, make sure your cuda runtime version >= 11.8

sg879 commented 1 year ago

Hi @salykovaa,

Thanks for the tip! I'm glad you managed to fix your installation. I tried to rollback the NVIDIA driver to 527.56, but it didn't fix the problem for me :(

I have opened a thread on the NVIDIA CUDA-GDB forum, here. Hopefully, someone posts a solution soon.

Out of interest, which installation method did you use for CUDA in WSL? I used the network repo installation and am thinking of trying out a different method to see if I succeed that way.

salykova commented 1 year ago

@sg879 I used deb (local) method and chose WSL-Ubuntu version. Also I tried cuda driver 11.7 and it works too. It seems that the main requirement is cuda-gdb must be >= 11.8

P.S. Before WSL installation, I've downloaded cuda toolkit 12 on windows via https://developer.nvidia.com/cuda-downloads. This automatically updates (downgrades) nvidia driver

P.S.S Make sure you updated path variables in .bashrc

export PATH=/usr/local/cuda-12.0/bin${PATH:+:${PATH}}
export LD_LIBRARY_PATH=/usr/local/cuda-12.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
sg879 commented 1 year ago

@salykovaa

THANK YOU SO MUCH!

The key was either installing the CUDA toolkit on Windows beforehand (was that stated somewhere in the installation instructions? I couldn't find it) or adding the export LD_LIBRARY_PATH=/usr/local/cuda-12.0/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}. Likely the latter, but I didn't think it was necessary for the deb installation methods, as that's what it says in the documentation.

salykova commented 1 year ago

@sg879

Nice to hear that!

1) Regarding the LD_LIBRARY, it was mentioned here post-installation-actions 2) Regarding installing the cuda toolkit on windows - no, it was actually my idea 😆

I assume, the udefined LD_LIBRARY_PATH was the reason why debugger didn't work