CFSworks / nvml_fix

A workaround for an annoying bug in nVidia's NVML library. Allows nvidia-smi to work once more!
98 stars 19 forks source link

Error when running in ubuntu 22.04 #43

Closed rafariossaa closed 2 years ago

rafariossaa commented 2 years ago

Hi, I am trying to use it in ubuntu 22.04, but I got the following error:

$ sudo dpkg-divert --add --local --divert /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1.orig --rename /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
Adding 'local diversion of /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 to /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1.orig'

$ sudo make install TARGET_VER=470.141.03 libdir=/usr/lib/x86_64-linux-gnu
gcc  -shared -fPIC -s empty.c -o libnvidia-ml.so.470.141.03 
gcc  -Wl,--no-as-needed -shared -fPIC -s -o libnvidia-ml.so.1 -DNVML_PATCH_470 -DNVML_PATCH_MINOR=141 -DNVML_VERSION=\"470.141.03\" libnvidia-ml.so.470.141.03 nvml_fix.c
/usr/bin/install -D -Dm755 libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1

$ nvidia-smi
Failed to initialize NVML: Unknown Error
Failed to properly shut down NVML: Function Not Found

Libraries created:

$  ls -l /usr/lib/x86_64-linux-gnu/libnvidia-ml.so*
lrwxrwxrwx 1 root root      17 jul  1 01:46 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so -> libnvidia-ml.so.1
-rwxr-xr-x 1 root root   14432 ago 30 16:29 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
lrwxrwxrwx 1 root root      26 jul  1 01:46 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1.orig -> libnvidia-ml.so.470.141.03
-rw-r--r-- 1 root root 1828056 jun 30 20:36 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.470.141.03

Could this be because the gcc version ubuntu 22.04 provides ?

$  gcc --version
gcc (Ubuntu 11.2.0-19ubuntu1) 11.2.0
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
rafariossaa commented 2 years ago

Sorry, I think I messed with my LD_LIBRARY_PATH. Now I don't get this error, but nvidia-smi doesn't show any process when running a render in blender using GPU.

Tue Aug 30 16:42:10 2022       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03   Driver Version: 470.141.03   CUDA Version: 11.4     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:01:00.0  On |                  N/A |
| N/A   50C    P0    N/A /  N/A |   1032MiB /  1993MiB |     99%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
tofurky commented 2 years ago

works fine on 22.04 for me.

what do you see here?

matt@aquos:~$ nvidia-smi -a|grep 'Product Brand'
    Product Brand                         : Quadro

if it says Quadro with nvml_fix installed, and GeForce without, then it's working as intended.

also, which card model? i think some cards just don't support reporting those metrics :/

tofurky commented 2 years ago

@rafariossaa could you please provide the previously mentioned nvidia-smi -a|grep 'Product Brand' with/without nvml_fix installed?