CFSworks / nvml_fix

A workaround for an annoying bug in nVidia's NVML library. Allows nvidia-smi to work once more!
98 stars 19 forks source link

No running processes found with 450.66 drivers #34

Closed jpoggi closed 4 years ago

jpoggi commented 4 years ago

Hi,

First of all, thanks a lot for providing this workaround for an nvidia bug.

I've used this workaround with some old driver and it worked perfectly.

I've recently update the nvidia driver to version 450.66 on a Nvidia Quadro P400 used to transcode video on Plex Media Server.

I've notice that they were 'No running processes found' when use hardware transcode on PMS.

So I'm wondering if my graphic card is working as expected.

I'v decided to apply the workaround again with no improvement.

Before installation

admin@server:~$ uname -a
Linux carpenter 5.4.0-48-generic #52-Ubuntu SMP Thu Sep 10 10:58:49 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
admin@server:~$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.1 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.1 LTS"
VERSION_ID="20.04"
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
VERSION_CODENAME=focal
UBUNTU_CODENAME=focal
root@server:~# ls -l /usr/lib/x86_64-linux-gnu/libnvidia-ml.so*
total 5592
lrwxrwxrwx 1 root root      17 sept. 25 00:20 libnvidia-ml.so -> libnvidia-ml.so.1*
lrwxrwxrwx 1 root root      22 sept. 25 00:24 libnvidia-ml.so.1 -> libnvidia-ml.so.450.66*
-rwxr-xr-x 1 root root 1905848 sept. 25 00:20 libnvidia-ml.so.450.66*
root@server:~$ nvidia-smi
Fri Sep 25 09:01:56 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.66       Driver Version: 450.66       CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro P400         Off  | 00000000:01:00.0 Off |                  N/A |
| 31%   44C    P0    N/A /  N/A |      0MiB /  2000MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+

Applying the workaround

admin@server:~/nvml_fix$ make TARGET_VER=450.66
gcc  -shared -fPIC -s empty.c -o libnvidia-ml.so.450.66 
gcc  -Wl,--no-as-needed -shared -fPIC -s -o libnvidia-ml.so.1 -DNVML_PATCH_450 -DNVML_PATCH_MINOR=66 -DNVML_VERSION=\"450.66\" libnvidia-ml.so.450.66 nvml_fix.c
admin@server:~/nvml_fix$ sudo rm -v /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
'/usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1' deleted
admin@server:~/nvml_fix$ sudo make install TARGET_VER=450.66 libdir=/usr/lib/x86_64-linux-gnu
/usr/bin/install -D -Dm755 libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
admin@server:~/nvml_fix$ sudo ls -l /usr/lib/x86_64-linux-gnu/libnvidia-ml.so*
lrwxrwxrwx 1 root root      17 sept. 25 00:20 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so -> libnvidia-ml.so.1
-rwxr-xr-x 1 root root   14432 sept. 25 09:03 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1
-rwxr-xr-x 1 root root 1905848 sept. 25 00:20 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.450.66

Some debug you have ask in another issue, may it help you (and me):

ltrace -s1024 -S -o nvidia-smi-ltrace.out nvidia-smi

nvidia-smi-ltrace.out.zip

strace -s1024 -o nvidia-smi-strace.txt

nvidia-smi-strace.txt.zip

ls -l /usr/lib*/{*nvidia*,libnv*} /usr/lib/*/{*nvidia*,libnv*} > nvidia-libs.txt

nvidia-libs.txt.zip

nvidia-smi -a > nvidia-smi.txt

nvidia-smi.txt.zip

When I start a transcode with hardware with PMS, I'm still having 'No running processes found'

Let me know if I can provide more intel.

jpoggi commented 4 years ago

Hi,

I'm just a f***ing noob lol

I haved disable the nvidia card on the bios by selecting the integrated motherboard....

Issue self resolved !

tofurky commented 4 years ago

:D thanks for attaching the detailed info anyways.