Open elezar opened 2 weeks ago
In my ticket https://github.com/NVIDIA/nvidia-container-toolkit/issues/672 where cloud has nvidia driver installed in /var/lib/nvidia/, nvidia-container-cli -k -d /dev/tty info
also complained
W0829 16:57:45.375509 1151 nvc_info.c:470] missing firmware path /usr/lib/firmware/nvidia/535.183.01/gsp*.bin
the firmware is actually located in the NVIDIA_ROOT (/var/lib/nvidia):
gfrankliu-t4-ws ➜ ~ ls -l /var/lib/nvidia
total 427620
-rw-r--r-- 1 root root 341725273 Aug 29 18:47 NVIDIA-Linux-x86_64-535.183.01.run
drwxr-xr-x 2 root root 4096 Aug 29 18:47 bin
drwxr-xr-x 3 root root 4096 Aug 29 18:47 bin-workdir
drwxr-xr-x 2 root root 4096 Aug 10 14:54 drivers
drwxr-xr-x 3 root root 4096 Aug 29 18:47 drivers-workdir
drwxr-xr-x 3 root root 4096 Aug 10 14:54 firmware
-rw-r--r-- 1 root root 2970 Aug 29 18:47 gpu_driver_versions.bin
drwxr-xr-x 5 root root 4096 Aug 29 18:47 lib64
drwxr-xr-x 3 root root 4096 Aug 29 18:47 lib64-workdir
-rw-r--r-- 1 root root 96106018 Aug 29 18:47 nvidia-drivers-535.183.01.tgz
-rw-r--r-- 1 root root 2355 Aug 29 18:47 nvidia-installer.log
drwxr-xr-x 4 root root 4096 Aug 29 18:47 share
gfrankliu-t4-ws ➜ ~ ls -l /var/lib/nvidia/firmware/nvidia/535.183.01
total 60540
-rw-r--r-- 1 1000 250 38159904 May 12 19:08 gsp_ga10x.bin
-rw-r--r-- 1 1000 250 23820576 May 12 19:08 gsp_tu10x.bin
gfrankliu-t4-ws ➜ ~
Does nvidia-container-toolkit only support when nvidia driver is installed in the default location?
When resolving firmware paths, we don't seem to resolve symlinks. This may cause issues on systems where
/lib -> /usr/lib
.See https://github.com/canonical/lxd/pull/13562/files#r1701610711