Open angel-devicente opened 5 months ago
Hello,
I don't know how Slurm allocates the GPUs, could you check if the library libnvidia-ml.so
is available?
That's the library used to get the GPU information, nvidia-smi
directly queries the driver or is statically linked to this library and hence will work without it.
It turned out that the problem seemed to come from the installed (Snap) version. The AppImage version works without issues.
Hi, I'm facing a problem with nvtop + Slurm interactive session. I get an interactive session in a machine with two GPUs. Slurm controls access to them, so in this particular case I'm requesting just one of them. I verify that I can use the GPU for computation, and the tool nvidia-smi detects this GPU (it shows only one, because that is what Slurm is giving me access to), but as you can see below, nvtop says that there is no GPU to monitor. I have no idea what could be going on in here. Any ideas of how to debug this issue and/or things I could try?
Thanks,