CFSworks / nvml_fix

A workaround for an annoying bug in nVidia's NVML library. Allows nvidia-smi to work once more!
98 stars 19 forks source link

no effect on 440.82 #31

Closed lbroto closed 4 years ago

lbroto commented 4 years ago

Hi folks,

just compiled and installed on debian sid with a working nvidia 440.82.

The make install does not work seems it replaced a link and prevent nvidia-smi to run. I directly put the libvidia-ml.so.1 into /usr/lib/x86_64-linux-gnu/nvidia/current/ and it did the trick but there is not change at the nvidia-smi level. Still a lot of N/A and the process list is not supported.

After stracing, the hacked library seems to be loaded.

My GPU is a GeForce GT710.

Thanks,

Laurent

tofurky commented 4 years ago

hi @lbroto

the default install path /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 is tailored towards ubuntu/mint users. i suppose 'deb-based distros' in the readme should change to 'Ubuntu-based distros'. i think i incorrectly assumed that all .deb distros used the mentioned path.

taking a look at https://packages.debian.org/testing/amd64/libnvidia-ml1/filelist, it appears the correct location on debian testing (and buster, too) is indeed /usr/lib/x86_64-linux-gnu/nvidia/current/.

an important thing to check is if you run the following command both WITH and WITHOUT nvml_fix installed:

matt@aquos:~$ nvidia-smi -a|grep 'Product Brand'
    Product Brand                   : Quadro

if nvml_fix is installed/working correctly (and this doesn't necessarily mean that power consumption etc is available - that is dependent on gpu/firmware), you should see Quadro. without nvml_fix installed, it should probably show GeForce.

if you're seeing Quadro then there isn't much more to be done i'm afraid :( but if you are still seeing GeForce or some error with nvml_fix installed, then it's probably not installed/working correctly.

please attach debugging output as described below (with nvml_fix installed). you may need to install ltrace, it sounds like you have strace already:

nvidia-smi-ltrace.txt:

ltrace -s1024 -S -o nvidia-smi-ltrace.txt nvidia-smi

nvidia-smi-strace.txt:

strace -s1024 -o nvidia-smi-strace.txt nvidia-smi

nvidia-libs.txt:

ls -l /usr/lib*/{*nvidia*,libnv*} /usr/lib/*/{*nvidia*,libnv*} /usr/lib/*/nvidia/*/* > nvidia-libs.txt

also it would be nice to see what the output of nvidia-smi is without nvml_fix installed: nvidia-smi.txt:

nvidia-smi -a > nvidia-smi.txt
lbroto commented 4 years ago

Hi @tofurky,

after running nvidia-smi with and without the patch, you're right, the Product Brand changed from GeForce to Quadro.

So I guess I would never seen the processes using the GPU ... sad :-)

Thanks for you work,

Laurent

Lehmaning commented 1 year ago

This command would show processes using the nvidia GPU: sudo fuser -fuv /dev/nvidia*, but the output has still lack of information about how much GPU resource being taken.