Open Zeke-S opened 4 years ago
The error line below seems to suggest something went wrong with the uninstall of your older driver when the new one was installed:
nvidia-container-cli: detection error: open failed: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.418.39: no such file or directory
Can you scan /usr/lib/
for any files of libnvidia*.418.39
(or any libnvidia
files with a version not equal to your latest installed driver). If there are any, manually delete them, and try again.
Hi @klueska I'm having a similar error and was wondering if you could shed any light :)
I followed the instructions here for RHEL7 but am getting the following error:
$ sudo docker run --rm -e NVIDIA_VISIBLE_DEVICES=all nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": unknown.
$ nvidia-smi
Thu Nov 19 15:11:46 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro P400 Off | 00000000:73:00.0 Off | N/A |
| 34% 31C P8 N/A / N/A | 134MiB / 1991MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 3468 G /usr/bin/X 64MiB |
| 0 N/A N/A 5920 G /usr/bin/gnome-shell 66MiB |
+-----------------------------------------------------------------------------+
Not sure if the below is relevant but:
$ sudo nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I1119 20:21:31.936660 4568 nvc.c:282] initializing library context (version=1.3.0, build=16315ebdf4b9728e899f615e208b50c41d7a5d15)
I1119 20:21:31.936830 4568 nvc.c:256] using root /
I1119 20:21:31.936835 4568 nvc.c:257] using ldcache /etc/ld.so.cache
I1119 20:21:31.936840 4568 nvc.c:258] using unprivileged user 65534:65534
I1119 20:21:31.936910 4568 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I1119 20:21:31.937012 4568 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
I1119 20:21:31.942752 4569 nvc.c:192] loading kernel module nvidia
I1119 20:21:31.943652 4569 nvc.c:204] loading kernel module nvidia_uvm
I1119 20:21:31.944111 4569 nvc.c:212] loading kernel module nvidia_modeset
I1119 20:21:31.944695 4570 driver.c:101] starting driver service
I1119 20:21:31.947382 4568 nvc_info.c:680] requesting driver information with ''
I1119 20:21:32.000531 4568 nvc_info.c:169] selecting /usr/lib64/libnvoptix.so.450.51.06
I1119 20:21:32.033466 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-tls.so.450.51.06
I1119 20:21:32.041682 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-rtcore.so.450.51.06
I1119 20:21:32.094310 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-ptxjitcompiler.so.450.51.06
I1119 20:21:32.128604 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-opticalflow.so.450.51.06
I1119 20:21:32.140956 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-opencl.so.450.51.06
I1119 20:21:32.185624 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-ngx.so.450.51.06
I1119 20:21:32.185771 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-ml.so.450.51.06
I1119 20:21:32.217839 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-ifr.so.450.51.06
I1119 20:21:32.230459 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-glvkspirv.so.450.51.06
I1119 20:21:32.230608 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-glsi.so.450.51.06
I1119 20:21:32.274841 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-glcore.so.450.51.06
I1119 20:21:32.308296 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-fbc.so.450.51.06
I1119 20:21:32.311729 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-encode.so.450.51.06
I1119 20:21:32.311861 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-eglcore.so.450.51.06
I1119 20:21:32.323190 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-compiler.so.450.51.06
I1119 20:21:32.361718 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-cfg.so.450.51.06
I1119 20:21:32.375582 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-cbl.so.450.51.06
I1119 20:21:32.376903 4568 nvc_info.c:169] selecting /usr/lib64/libnvidia-allocator.so.450.51.06
I1119 20:21:32.389483 4568 nvc_info.c:169] selecting /usr/lib64/libnvcuvid.so.450.51.06
I1119 20:21:32.390085 4568 nvc_info.c:169] selecting /usr/lib64/libcuda.so.450.51.06
I1119 20:21:32.428453 4568 nvc_info.c:169] selecting /usr/lib64/libGLX_nvidia.so.450.51.06
I1119 20:21:32.436373 4568 nvc_info.c:169] selecting /usr/lib64/libGLESv2_nvidia.so.450.51.06
I1119 20:21:32.436896 4568 nvc_info.c:169] selecting /usr/lib64/libGLESv1_CM_nvidia.so.450.51.06
I1119 20:21:32.436965 4568 nvc_info.c:169] selecting /usr/lib64/libEGL_nvidia.so.450.51.06
W1119 20:21:32.436996 4568 nvc_info.c:350] missing library libnvidia-fatbinaryloader.so
W1119 20:21:32.437002 4568 nvc_info.c:350] missing library libvdpau_nvidia.so
W1119 20:21:32.437006 4568 nvc_info.c:354] missing compat32 library libnvidia-ml.so
W1119 20:21:32.437011 4568 nvc_info.c:354] missing compat32 library libnvidia-cfg.so
W1119 20:21:32.437015 4568 nvc_info.c:354] missing compat32 library libcuda.so
W1119 20:21:32.437019 4568 nvc_info.c:354] missing compat32 library libnvidia-opencl.so
W1119 20:21:32.437023 4568 nvc_info.c:354] missing compat32 library libnvidia-ptxjitcompiler.so
W1119 20:21:32.437028 4568 nvc_info.c:354] missing compat32 library libnvidia-fatbinaryloader.so
W1119 20:21:32.437032 4568 nvc_info.c:354] missing compat32 library libnvidia-allocator.so
W1119 20:21:32.437037 4568 nvc_info.c:354] missing compat32 library libnvidia-compiler.so
W1119 20:21:32.437041 4568 nvc_info.c:354] missing compat32 library libnvidia-ngx.so
W1119 20:21:32.437045 4568 nvc_info.c:354] missing compat32 library libvdpau_nvidia.so
W1119 20:21:32.437050 4568 nvc_info.c:354] missing compat32 library libnvidia-encode.so
W1119 20:21:32.437054 4568 nvc_info.c:354] missing compat32 library libnvidia-opticalflow.so
W1119 20:21:32.437059 4568 nvc_info.c:354] missing compat32 library libnvcuvid.so
W1119 20:21:32.437063 4568 nvc_info.c:354] missing compat32 library libnvidia-eglcore.so
W1119 20:21:32.437068 4568 nvc_info.c:354] missing compat32 library libnvidia-glcore.so
W1119 20:21:32.437072 4568 nvc_info.c:354] missing compat32 library libnvidia-tls.so
W1119 20:21:32.437076 4568 nvc_info.c:354] missing compat32 library libnvidia-glsi.so
W1119 20:21:32.437081 4568 nvc_info.c:354] missing compat32 library libnvidia-fbc.so
W1119 20:21:32.437085 4568 nvc_info.c:354] missing compat32 library libnvidia-ifr.so
W1119 20:21:32.437089 4568 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so
W1119 20:21:32.437094 4568 nvc_info.c:354] missing compat32 library libnvoptix.so
W1119 20:21:32.437098 4568 nvc_info.c:354] missing compat32 library libGLX_nvidia.so
W1119 20:21:32.437102 4568 nvc_info.c:354] missing compat32 library libEGL_nvidia.so
W1119 20:21:32.437106 4568 nvc_info.c:354] missing compat32 library libGLESv2_nvidia.so
W1119 20:21:32.437110 4568 nvc_info.c:354] missing compat32 library libGLESv1_CM_nvidia.so
W1119 20:21:32.437114 4568 nvc_info.c:354] missing compat32 library libnvidia-glvkspirv.so
W1119 20:21:32.437119 4568 nvc_info.c:354] missing compat32 library libnvidia-cbl.so
I1119 20:21:32.437396 4568 nvc_info.c:276] selecting /usr/bin/nvidia-smi
I1119 20:21:32.437423 4568 nvc_info.c:276] selecting /usr/bin/nvidia-debugdump
I1119 20:21:32.437443 4568 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced
I1119 20:21:32.437461 4568 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-control
I1119 20:21:32.437480 4568 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-server
I1119 20:21:32.437504 4568 nvc_info.c:438] listing device /dev/nvidiactl
I1119 20:21:32.437508 4568 nvc_info.c:438] listing device /dev/nvidia-uvm
I1119 20:21:32.437513 4568 nvc_info.c:438] listing device /dev/nvidia-uvm-tools
I1119 20:21:32.437517 4568 nvc_info.c:438] listing device /dev/nvidia-modeset
W1119 20:21:32.437979 4568 nvc_info.c:321] missing ipc /var/run/nvidia-persistenced/socket
W1119 20:21:32.438162 4568 nvc_info.c:321] missing ipc /tmp/nvidia-mps
I1119 20:21:32.438167 4568 nvc_info.c:745] requesting device information with ''
I1119 20:21:32.444184 4568 nvc_info.c:628] listing device /dev/nvidia0 (GPU-14f528da-b23c-9235-dfec-15248c7c5e97 at 00000000:73:00.0)
NVRM version: 450.51.06
CUDA version: 11.0
Device Index: 0
Device Minor: 0
Model: Quadro P400
Brand: Quadro
GPU UUID: GPU-14f528da-b23c-9235-dfec-15248c7c5e97
Bus Location: 00000000:73:00.0
Architecture: 6.1
I1119 20:21:32.444229 4568 nvc.c:337] shutting down library context
I1119 20:21:32.444698 4570 driver.c:156] terminating driver service
I1119 20:21:32.445175 4568 driver.c:196] driver service terminated successfully
It looks like there are a fair amount of people who got this issue in the past, but most of the support was for people who were using nvidia-docker rather than nvidia container toolkit. I guess the path inside the container can't find nvidia-smi command?
Would be very grateful for any help!
@bwinsto2 Your error is actually quite different.
The original error is:
nvidia-container-cli: detection error: open failed: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.418.39: no such file or directory
This error points to a failed driver uninstall, followed by a new install.
Your error is:
starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": unknown.
Your error points to docker not picking up and using the nvidia-container-runtime
for some reason.
@klueska thanks, you're right. Would you recommend I open a new support thread?
Just trying to put the pieces together, you said in May that only nvidia-container-toolkit
should be necessary since it obviates the need for nvidia-container-runtime
. As I'm new to the world of combining Docker and GPUs, I'm unsure as how to troubleshoot next steps.
With the way you ran the command you would still need nvidia-docker
(or at least nvidia-container-runtime
with a manual edit to /etc/docker/daemon.json
to point docker at the nvidia-container-runtime
).
Docker integration with the nvidia-container-toolkit
only works if you use the --gpus
flag to docker instead of setting the NVIDIA_VISIBLE_DEVICES
environment variable.
So the equivalent command to what you provided above would be:
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
or
sudo docker run --rm --gpus "device=all" nvidia/cuda:11.0-base nvidia-smi
To get your origin command to work you would need to update /etc/docker/daemon.json
with the following settings:
{
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
With this in place your original command should work:
sudo docker run --rm -e NVIDIA_VISIBLE_DEVICES=all nvidia/cuda:11.0-base nvidia-smi
And if you didn't want to set the default runtime to nvidia
in this config you could leave this out and run your command as:
sudo docker run --rm --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all nvidia/cuda:11.0-base nvidia-smi
The error line below seems to suggest something went wrong with the uninstall of your older driver when the new one was installed:
nvidia-container-cli: detection error: open failed: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.418.39: no such file or directory
Can you scan
/usr/lib/
for any files oflibnvidia*.418.39
(or anylibnvidia
files with a version not equal to your latest installed driver). If there are any, manually delete them, and try again.
Hi, I get the same error. I found and deleted any files of libnvidia*.460. which are not equal to my Driver Version (440) in usr/lib
. But the error is still there. Do you have any idea about that?
The error output:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: open failed: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.460.56: no such file or directory\\\\n\\\"\"": unknown.
After upgrading the Nvidia driver to 465 which is higher than the error, the error miraculously disappeared!
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
Thank you, this worked!
I had the same issue, after I ran "sudo apt-get --fix-broken install" fixed my issue
The error line below seems to suggest something went wrong with the uninstall of your older driver when the new one was installed:
nvidia-container-cli: detection error: open failed: /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.418.39: no such file or directory
Can you scan
/usr/lib/
for any files oflibnvidia*.418.39
(or anylibnvidia
files with a version not equal to your latest installed driver). If there are any, manually delete them, and try again.
Hello, I have the same error and I tried the mentioned method but it still not work. It seems there is no libnvidia
files with version 470
ywl@ywl-System-Product-Name:/usr/lib/x86_64-linux-gnu$ find /usr/lib/x86_64-linux-gnu/ -name 'libnvidia*' -print /usr/lib/x86_64-linux-gnu/libnvidia-api.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-vulkan-producer.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-egl-wayland.so.1.1.11 /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-gtk3.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-wayland-client.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-container-go.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.4 /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-container.so.1.13.5 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-vulkan-producer.so /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.465.31 /usr/lib/x86_64-linux-gnu/libnvidia-container-go.so.1.13.5 /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so /usr/lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1.1.0 /usr/lib/x86_64-linux-gnu/libnvidia-gtk2.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-pkcs11.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-container.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.1 /usr/lib/x86_64-linux-gnu/libnvidia-encode.so /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.535.183.01 /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.1
my nvidia-driver version is:
`ywl@ywl-System-Product-Name:/etc/nvidia-container-runtime$ nvidia-smi
Wed Jun 19 16:00:18 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 NVIDIA GeForce RTX 3090 Off | 00000000:01:00.0 On | N/A |
| 50% 36C P8 41W / 350W | 855MiB / 24576MiB | 20% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 1698 G /usr/lib/xorg/Xorg 56MiB | | 0 N/A N/A 1856 G /usr/bin/gnome-shell 135MiB | | 0 N/A N/A 3054 G /usr/lib/xorg/Xorg 287MiB | | 0 N/A N/A 3270 G /usr/bin/gnome-shell 49MiB | | 0 N/A N/A 4572 G ...7778866,17946271164743959776,131072 220MiB | | 0 N/A N/A 6964 G /proc/self/exe 77MiB | +---------------------------------------------------------------------------------------+ `
Hi Guys, After update nvidia driver to 450.80.02 I ran the command:
And I got this error:
$ nvidia-smi
$ sudo nvidia-container-cli -k -d /dev/tty info
Can someone help me ?