NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
2.52k stars 271 forks source link

centos container_linux.go:345: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": unknown. #298

Open SubMarineas opened 4 years ago

SubMarineas commented 4 years ago

I have another problem, the problem occurred when I started the official mirror

image

I typed systemctl status docker.service and did not see the problem:

● docker.service - Docker Application Container Engine
   Loaded: loaded (/usr/lib/systemd/system/docker.service; enabled; vendor preset: disabled)
  Drop-In: /etc/systemd/system/docker.service.d
           └─override.conf
   Active: active (running) since Tue 2020-09-08 11:09:10 CST; 7min ago
     Docs: https://docs.docker.com
 Main PID: 30459 (dockerd)
    Tasks: 40
   Memory: 66.7M
   CGroup: /system.slice/docker.service
           ├─30459 /usr/bin/dockerd --host=fd:// --add-runtime=nvidia=/usr/bin/nvidia-container-runtime
           ├─30610 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 3306 -container-ip 172.18.0.2 -container-port 3306
           ├─30672 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 15672 -container-ip 172.18.0.4 -container-port 15672
           └─30688 /usr/bin/docker-proxy -proto tcp -host-ip 0.0.0.0 -host-port 5672 -container-ip 172.18.0.4 -container-port 5672

Sep 08 11:09:09 10-9-111-182 dockerd[30459]: time="2020-09-08T11:09:09.703146487+08:00" level=info msg="Loading containers: start."
Sep 08 11:09:09 10-9-111-182 dockerd[30459]: time="2020-09-08T11:09:09.817109186+08:00" level=info msg="Default bridge (docker0) is assigned with an IP address 172.17.0.0/16. Daemon option --

The nvidia-docker version is:

Docker version 19.03.12, build 48a66213fe

docker volume:

[root@10-9-111-182 docker]# docker volume list
DRIVER              VOLUME NAME
local               f32bc4d3933b47c923b0e3e86222e2476e7131566950daad756790bc4129626d
nvidia-docker       nvidia_driver_450.51.06

The startup search path is:

[root@10-9-111-182 docker]# nvidia-docker run --rm nvidia/cuda:10.0-devel "echo $PATH"
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"echo /usr/local/ffmpeg/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/home/anaconda3/bin:/usr/local/cuda/bin:/root/bin\": stat echo /usr/local/ffmpeg/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/home/anaconda3/bin:/usr/local/cuda/bin:/root/bin: no such file or directory": unknown.
ERRO[0000] error waiting for container: context canceled
SubMarineas commented 4 years ago

My host information is in:

https://github.com/NVIDIA/nvidia-docker/issues/1381

klueska commented 4 years ago

If all you installed was nvidia-container-toolkit (and not nvidia-docker2) then you need to use the --gpus option and not -e NVIDIA_VISIBLE_DEVICES to request a set of GPUs, i.e.:

docker run --gpus=all ...

If you install nvidia-docker2 then you can also use -e NVIDIA_VISIBLE_DEVICES, but you need to tell it to use the nvidia runtime, i.e.:

docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ...
or 
nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=all ...
SubMarineas commented 4 years ago

If all you installed was nvidia-container-toolkit (and not nvidia-docker2) then you need to use the --gpus option and not -e NVIDIA_VISIBLE_DEVICES to request a set of GPUs, i.e.:

docker run --gpus=all ...

If you install nvidia-docker2 then you can also use -e NVIDIA_VISIBLE_DEVICES, but you need to tell it to use the nvidia runtime, i.e.:

docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ...
or 
nvidia-docker run -e NVIDIA_VISIBLE_DEVICES=all ...

image

I tried it, but it still didn’t work. I installed nvidia-docker2

klueska commented 4 years ago

That usually point to an installation error fo the driver on the host. Are you able to run nvidia-smi directly on your host without problems?

Can you also show me the output of the following on the host:

nvidia-container-cli -k -d /dev/tty info
SubMarineas commented 4 years ago

这通常指向主机上驱动程序的安装错误。您是否可以nvidia-smi直接在主机上运行而不会出现问题?

您能否在主机上向我显示以下内容的输出:

nvidia-container-cli -k -d /dev/tty info

image

I found it seems to be an initialization error?

dxcore initialization failed, continuing assuming a non-WSL environment

klueska commented 4 years ago

That is fine so long as you are not running under windows. Are you running under windows WSL2?

klueska commented 4 years ago

Can you show me the rest of the output from:

nvidia-container-cli -k -d /dev/tty info
SubMarineas commented 4 years ago

Can you show me the rest of the output from:

nvidia-container-cli -k -d /dev/tty info

No i am on centos 7

[root@10-9-111-182 new_videoana]# nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0908 08:22:39.703533 15854 nvc.c:282] initializing library context (version=1.3.0, build=af0220ff5c503d9ac6a1b5a491918229edbb37a4)
I0908 08:22:39.703568 15854 nvc.c:256] using root /
I0908 08:22:39.703571 15854 nvc.c:257] using ldcache /etc/ld.so.cache
I0908 08:22:39.703574 15854 nvc.c:258] using unprivileged user 65534:65534
I0908 08:22:39.703586 15854 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0908 08:22:39.703720 15854 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
I0908 08:22:39.704860 15855 nvc.c:192] loading kernel module nvidia
I0908 08:22:39.705010 15855 nvc.c:204] loading kernel module nvidia_uvm
I0908 08:22:39.705074 15855 nvc.c:212] loading kernel module nvidia_modeset
I0908 08:22:39.705378 15856 driver.c:101] starting driver service
I0908 08:22:40.146237 15854 nvc_info.c:680] requesting driver information with ''
I0908 08:22:40.147373 15854 nvc_info.c:169] selecting /usr/lib64/vdpau/libvdpau_nvidia.so.450.51.06
I0908 08:22:40.147485 15854 nvc_info.c:169] selecting /usr/lib64/libnvoptix.so.450.51.06
I0908 08:22:40.147523 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-tls.so.450.51.06
I0908 08:22:40.147547 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-rtcore.so.450.51.06
I0908 08:22:40.147572 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-ptxjitcompiler.so.450.51.06
I0908 08:22:40.147605 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-opticalflow.so.450.51.06
I0908 08:22:40.147634 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-opencl.so.450.51.06
I0908 08:22:40.147656 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-ngx.so.450.51.06
I0908 08:22:40.147682 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-ml.so.450.51.06
I0908 08:22:40.147711 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-ifr.so.450.51.06
I0908 08:22:40.147749 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-glvkspirv.so.450.51.06
I0908 08:22:40.147774 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-glsi.so.450.51.06
I0908 08:22:40.147797 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-glcore.so.450.51.06
I0908 08:22:40.147822 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-fbc.so.450.51.06
I0908 08:22:40.147853 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-encode.so.450.51.06
I0908 08:22:40.147885 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-eglcore.so.450.51.06
I0908 08:22:40.147909 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-compiler.so.450.51.06
I0908 08:22:40.147933 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-cfg.so.450.51.06
I0908 08:22:40.147963 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-cbl.so.450.51.06
I0908 08:22:40.147987 15854 nvc_info.c:169] selecting /usr/lib64/libnvidia-allocator.so.450.51.06
I0908 08:22:40.148020 15854 nvc_info.c:169] selecting /usr/lib64/libnvcuvid.so.450.51.06
I0908 08:22:40.148119 15854 nvc_info.c:169] selecting /usr/lib64/libcuda.so.450.51.06
I0908 08:22:40.148191 15854 nvc_info.c:169] selecting /usr/lib64/libGLX_nvidia.so.450.51.06
I0908 08:22:40.148217 15854 nvc_info.c:169] selecting /usr/lib64/libGLESv2_nvidia.so.450.51.06
I0908 08:22:40.148244 15854 nvc_info.c:169] selecting /usr/lib64/libGLESv1_CM_nvidia.so.450.51.06
I0908 08:22:40.148270 15854 nvc_info.c:169] selecting /usr/lib64/libEGL_nvidia.so.450.51.06
I0908 08:22:40.148301 15854 nvc_info.c:169] selecting /usr/lib/vdpau/libvdpau_nvidia.so.450.51.06
I0908 08:22:40.148336 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-tls.so.450.51.06
I0908 08:22:40.148358 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-ptxjitcompiler.so.450.51.06
I0908 08:22:40.148391 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-opticalflow.so.450.51.06
I0908 08:22:40.148421 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-opencl.so.450.51.06
I0908 08:22:40.148446 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-ml.so.450.51.06
I0908 08:22:40.148477 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-ifr.so.450.51.06
I0908 08:22:40.148510 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-glvkspirv.so.450.51.06
I0908 08:22:40.148532 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-glsi.so.450.51.06
I0908 08:22:40.148555 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-glcore.so.450.51.06
I0908 08:22:40.148579 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-fbc.so.450.51.06
I0908 08:22:40.148608 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-encode.so.450.51.06
I0908 08:22:40.148638 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-eglcore.so.450.51.06
I0908 08:22:40.148660 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-compiler.so.450.51.06
I0908 08:22:40.148684 15854 nvc_info.c:169] selecting /usr/lib/libnvidia-allocator.so.450.51.06
I0908 08:22:40.148714 15854 nvc_info.c:169] selecting /usr/lib/libnvcuvid.so.450.51.06
I0908 08:22:40.148754 15854 nvc_info.c:169] selecting /usr/lib/libcuda.so.450.51.06
I0908 08:22:40.148785 15854 nvc_info.c:169] selecting /usr/lib/libGLX_nvidia.so.450.51.06
I0908 08:22:40.148809 15854 nvc_info.c:169] selecting /usr/lib/libGLESv2_nvidia.so.450.51.06
I0908 08:22:40.148833 15854 nvc_info.c:169] selecting /usr/lib/libGLESv1_CM_nvidia.so.450.51.06
I0908 08:22:40.148856 15854 nvc_info.c:169] selecting /usr/lib/libEGL_nvidia.so.450.51.06
W0908 08:22:40.148872 15854 nvc_info.c:350] missing library libnvidia-fatbinaryloader.so
W0908 08:22:40.148879 15854 nvc_info.c:354] missing compat32 library libnvidia-cfg.so
W0908 08:22:40.148884 15854 nvc_info.c:354] missing compat32 library libnvidia-fatbinaryloader.so
W0908 08:22:40.148890 15854 nvc_info.c:354] missing compat32 library libnvidia-ngx.so
W0908 08:22:40.148893 15854 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so
W0908 08:22:40.148899 15854 nvc_info.c:354] missing compat32 library libnvoptix.so
W0908 08:22:40.148901 15854 nvc_info.c:354] missing compat32 library libnvidia-cbl.so
I0908 08:22:40.149069 15854 nvc_info.c:276] selecting /usr/bin/nvidia-smi
I0908 08:22:40.149084 15854 nvc_info.c:276] selecting /usr/bin/nvidia-debugdump
I0908 08:22:40.149096 15854 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced
I0908 08:22:40.149111 15854 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-control
I0908 08:22:40.149125 15854 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-server
I0908 08:22:40.149144 15854 nvc_info.c:438] listing device /dev/nvidiactl
I0908 08:22:40.149150 15854 nvc_info.c:438] listing device /dev/nvidia-uvm
I0908 08:22:40.149153 15854 nvc_info.c:438] listing device /dev/nvidia-uvm-tools
I0908 08:22:40.149156 15854 nvc_info.c:438] listing device /dev/nvidia-modeset
W0908 08:22:40.149207 15854 nvc_info.c:321] missing ipc /var/run/nvidia-persistenced/socket
W0908 08:22:40.149216 15854 nvc_info.c:321] missing ipc /tmp/nvidia-mps
I0908 08:22:40.149222 15854 nvc_info.c:745] requesting device information with ''
I0908 08:22:40.155131 15854 nvc_info.c:628] listing device /dev/nvidia0 (GPU-8546d1d2-7f12-2014-2498-6738e7ac1d2b at 00000000:00:03.0)
NVRM version:   450.51.06
CUDA version:   11.0

Device Index:   0
Device Minor:   0
Model:          Tesla T4
Brand:          Tesla
GPU UUID:       GPU-8546d1d2-7f12-2014-2498-6738e7ac1d2b
Bus Location:   00000000:00:03.0
Architecture:   7.5
I0908 08:22:40.155167 15854 nvc.c:337] shutting down library context
I0908 08:22:40.223031 15856 driver.c:156] terminating driver service
I0908 08:22:40.223527 15854 driver.c:196] driver service terminated successfully
klueska commented 4 years ago

Hmm. That is strange, if nvidia-container-cli -k -d /dev/tty info is able to run successfully and print out the GPU info, then I would expect docker run --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=all ... to run without errors as well.

SubMarineas commented 4 years ago

image

bkodirov commented 4 years ago

Running into the same issue. Host OS Centos 7. Debug info

I1028 06:07:18.505252 26698 nvc.c:282] initializing library context (version=1.3.0, build=16315ebdf4b9728e899f615e208b50c41d7a5d15) I1028 06:07:18.505572 26698 nvc.c:256] using root / I1028 06:07:18.505639 26698 nvc.c:257] using ldcache /etc/ld.so.cache I1028 06:07:18.505679 26698 nvc.c:258] using unprivileged user 65534:65534 I1028 06:07:18.505760 26698 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I1028 06:07:18.506096 26698 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment I1028 06:07:18.507221 26699 nvc.c:192] loading kernel module nvidia I1028 06:07:18.507490 26699 nvc.c:204] loading kernel module nvidia_uvm I1028 06:07:18.507559 26699 nvc.c:212] loading kernel module nvidia_modeset I1028 06:07:18.507792 26700 driver.c:101] starting driver service I1028 06:07:18.509314 26698 nvc_info.c:680] requesting driver information with '' I1028 06:07:18.510351 26698 nvc_info.c:169] selecting /usr/lib64/libnvoptix.so.418.67 I1028 06:07:18.510396 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-tls.so.418.67 I1028 06:07:18.510440 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-rtcore.so.418.67 I1028 06:07:18.510478 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-ptxjitcompiler.so.418.67 I1028 06:07:18.510510 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-opticalflow.so.418.67 I1028 06:07:18.510534 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-opencl.so.418.67 I1028 06:07:18.510557 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-ml.so.418.67 I1028 06:07:18.510587 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-ifr.so.418.67 I1028 06:07:18.510618 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-glvkspirv.so.418.67 I1028 06:07:18.510654 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-glsi.so.418.67 I1028 06:07:18.510679 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-glcore.so.418.67 I1028 06:07:18.510704 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-fbc.so.418.67 I1028 06:07:18.510737 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-fatbinaryloader.so.418.67 I1028 06:07:18.510761 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-encode.so.418.67 I1028 06:07:18.510794 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-eglcore.so.418.67 I1028 06:07:18.510819 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-compiler.so.418.67 I1028 06:07:18.510844 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-cfg.so.418.67 I1028 06:07:18.510877 26698 nvc_info.c:169] selecting /usr/lib64/libnvidia-cbl.so.418.67 I1028 06:07:18.510904 26698 nvc_info.c:169] selecting /usr/lib64/libnvcuvid.so.418.67 I1028 06:07:18.511026 26698 nvc_info.c:169] selecting /usr/lib64/libcuda.so.418.67 I1028 06:07:18.511089 26698 nvc_info.c:169] selecting /usr/lib64/libGLX_nvidia.so.418.67 I1028 06:07:18.511115 26698 nvc_info.c:169] selecting /usr/lib64/libGLESv2_nvidia.so.418.67 I1028 06:07:18.511140 26698 nvc_info.c:169] selecting /usr/lib64/libGLESv1_CM_nvidia.so.418.67 I1028 06:07:18.511166 26698 nvc_info.c:169] selecting /usr/lib64/libEGL_nvidia.so.418.67 W1028 06:07:18.511184 26698 nvc_info.c:350] missing library libnvidia-allocator.so W1028 06:07:18.511192 26698 nvc_info.c:350] missing library libnvidia-ngx.so W1028 06:07:18.511199 26698 nvc_info.c:350] missing library libvdpau_nvidia.so W1028 06:07:18.511207 26698 nvc_info.c:354] missing compat32 library libnvidia-ml.so W1028 06:07:18.511214 26698 nvc_info.c:354] missing compat32 library libnvidia-cfg.so W1028 06:07:18.511222 26698 nvc_info.c:354] missing compat32 library libcuda.so W1028 06:07:18.511229 26698 nvc_info.c:354] missing compat32 library libnvidia-opencl.so W1028 06:07:18.511236 26698 nvc_info.c:354] missing compat32 library libnvidia-ptxjitcompiler.so W1028 06:07:18.511243 26698 nvc_info.c:354] missing compat32 library libnvidia-fatbinaryloader.so W1028 06:07:18.511250 26698 nvc_info.c:354] missing compat32 library libnvidia-allocator.so W1028 06:07:18.511257 26698 nvc_info.c:354] missing compat32 library libnvidia-compiler.so W1028 06:07:18.511265 26698 nvc_info.c:354] missing compat32 library libnvidia-ngx.so W1028 06:07:18.511272 26698 nvc_info.c:354] missing compat32 library libvdpau_nvidia.so W1028 06:07:18.511279 26698 nvc_info.c:354] missing compat32 library libnvidia-encode.so W1028 06:07:18.511286 26698 nvc_info.c:354] missing compat32 library libnvidia-opticalflow.so W1028 06:07:18.511293 26698 nvc_info.c:354] missing compat32 library libnvcuvid.so W1028 06:07:18.511300 26698 nvc_info.c:354] missing compat32 library libnvidia-eglcore.so W1028 06:07:18.511307 26698 nvc_info.c:354] missing compat32 library libnvidia-glcore.so W1028 06:07:18.511315 26698 nvc_info.c:354] missing compat32 library libnvidia-tls.so W1028 06:07:18.511322 26698 nvc_info.c:354] missing compat32 library libnvidia-glsi.so W1028 06:07:18.511329 26698 nvc_info.c:354] missing compat32 library libnvidia-fbc.so W1028 06:07:18.511336 26698 nvc_info.c:354] missing compat32 library libnvidia-ifr.so W1028 06:07:18.511343 26698 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so W1028 06:07:18.511350 26698 nvc_info.c:354] missing compat32 library libnvoptix.so W1028 06:07:18.511357 26698 nvc_info.c:354] missing compat32 library libGLX_nvidia.so W1028 06:07:18.511364 26698 nvc_info.c:354] missing compat32 library libEGL_nvidia.so W1028 06:07:18.511371 26698 nvc_info.c:354] missing compat32 library libGLESv2_nvidia.so W1028 06:07:18.511379 26698 nvc_info.c:354] missing compat32 library libGLESv1_CM_nvidia.so W1028 06:07:18.511386 26698 nvc_info.c:354] missing compat32 library libnvidia-glvkspirv.so W1028 06:07:18.511393 26698 nvc_info.c:354] missing compat32 library libnvidia-cbl.so I1028 06:07:18.511621 26698 nvc_info.c:276] selecting /usr/bin/nvidia-smi I1028 06:07:18.511639 26698 nvc_info.c:276] selecting /usr/bin/nvidia-debugdump I1028 06:07:18.511657 26698 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced I1028 06:07:18.511674 26698 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-control I1028 06:07:18.511691 26698 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-server I1028 06:07:18.511712 26698 nvc_info.c:438] listing device /dev/nvidiactl I1028 06:07:18.511720 26698 nvc_info.c:438] listing device /dev/nvidia-uvm I1028 06:07:18.511727 26698 nvc_info.c:438] listing device /dev/nvidia-uvm-tools I1028 06:07:18.511735 26698 nvc_info.c:438] listing device /dev/nvidia-modeset I1028 06:07:18.511758 26698 nvc_info.c:317] listing ipc /run/nvidia-persistenced/socket W1028 06:07:18.511772 26698 nvc_info.c:321] missing ipc /tmp/nvidia-mps I1028 06:07:18.511780 26698 nvc_info.c:745] requesting device information with '' I1028 06:07:18.518201 26698 nvc_info.c:628] listing device /dev/nvidia0 (GPU-31032fea-9942-9380-d43d-37baa2dc633c at 00000000:01:00.0) I1028 06:07:18.524671 26698 nvc_info.c:628] listing device /dev/nvidia1 (GPU-2fceb187-a4d8-3d81-6dc1-5ca28f34c5a3 at 00000000:02:00.0) NVRM version: 418.67 CUDA version: 10.1

Device Index: 0 Device Minor: 0 Model: GeForce RTX 2080 Brand: GeForce GPU UUID: GPU-31032fea-9942-9380-d43d-37baa2dc633c Bus Location: 00000000:01:00.0 Architecture: 7.5

Device Index: 1 Device Minor: 1 Model: GeForce RTX 2080 Brand: GeForce GPU UUID: GPU-2fceb187-a4d8-3d81-6dc1-5ca28f34c5a3 Bus Location: 00000000:02:00.0 Architecture: 7.5 I1028 06:07:18.524773 26698 nvc.c:337] shutting down library context I1028 06:07:18.525110 26700 driver.c:156] terminating driver service I1028 06:07:18.525383 26698 driver.c:196] driver service terminated successfully

docker run --gpus all nvidia/cuda:9.0-base nvidia-smi Working perfectly and I can see the output from nvidia-smi

But the image version 11 is not happy. Output: docker run --rm -e NVIDIA_VISIBLE_DEVICES=all nvidia/cuda:11.0-base nvidia-smi docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": unknown.

KeithTt commented 1 year ago

Same problem here.

CentOS7 Containerd1.6.6

nerdctl run --runtime=nvidia -d -p 8500:8500 -p 8700:8501 \
>         -e NVIDIA_VISIBLE_DEVICES=0 \
>         --name=tfserving_00 \
>         -v /opt/share/tfserving/models:/models harbor.matrixback.com/tfserving/serving:latest-gpu \
>         --model_config_file=/models/configs/models.config \
>         --batching_parameters_file=/models/configs/batch.config \
>         --model_config_file_poll_wait_seconds=15 \
>         --allow_version_labels_for_unavailable_models=true \
>         --enable_batching=true
FATA[0000] failed to create shim task: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v2.task/default/febd2c725e07426ebce536f3c11f4ec9741b6a2f1f8c2123096e4ab0300fc3db/log.json: no such file or directory): exec: "nvidia": executable file not found in $PATH: unknown
nvidia-container-cli -k -d /dev/tty info

-- WARNING, the following logs are for debugging purposes only --

I0618 14:02:28.238780 1957046 nvc.c:376] initializing library context (version=1.13.1, build=6f4aea0fca16aaff01bab2567adb34ec30847a0e)
I0618 14:02:28.238956 1957046 nvc.c:350] using root /
I0618 14:02:28.238971 1957046 nvc.c:351] using ldcache /etc/ld.so.cache
I0618 14:02:28.238984 1957046 nvc.c:352] using unprivileged user 65534:65534
I0618 14:02:28.239022 1957046 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0618 14:02:28.239166 1957046 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
I0618 14:02:28.248407 1957047 nvc.c:278] loading kernel module nvidia
I0618 14:02:28.249460 1957047 nvc.c:282] running mknod for /dev/nvidiactl
I0618 14:02:28.249564 1957047 nvc.c:286] running mknod for /dev/nvidia0
I0618 14:02:28.249633 1957047 nvc.c:286] running mknod for /dev/nvidia1
I0618 14:02:28.249691 1957047 nvc.c:286] running mknod for /dev/nvidia2
I0618 14:02:28.249746 1957047 nvc.c:286] running mknod for /dev/nvidia3
I0618 14:02:28.249802 1957047 nvc.c:286] running mknod for /dev/nvidia4
I0618 14:02:28.249857 1957047 nvc.c:286] running mknod for /dev/nvidia5
I0618 14:02:28.249913 1957047 nvc.c:286] running mknod for /dev/nvidia6
I0618 14:02:28.249970 1957047 nvc.c:286] running mknod for /dev/nvidia7
I0618 14:02:28.250025 1957047 nvc.c:290] running mknod for all nvcaps in /dev/nvidia-caps
I0618 14:02:28.262020 1957047 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config
I0618 14:02:28.262256 1957047 nvc.c:218] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor
I0618 14:02:28.276238 1957047 nvc.c:296] loading kernel module nvidia_uvm
I0618 14:02:28.276445 1957047 nvc.c:300] running mknod for /dev/nvidia-uvm
I0618 14:02:28.276538 1957047 nvc.c:305] loading kernel module nvidia_modeset
I0618 14:02:28.277062 1957047 nvc.c:309] running mknod for /dev/nvidia-modeset
I0618 14:02:28.277656 1957048 rpc.c:71] starting driver rpc service
I0618 14:02:28.288728 1957049 rpc.c:71] starting nvcgo rpc service
I0618 14:02:28.290922 1957046 nvc_info.c:796] requesting driver information with ''
I0618 14:02:28.293526 1957046 nvc_info.c:174] selecting /usr/lib64/vdpau/libvdpau_nvidia.so.520.56.06
I0618 14:02:28.293810 1957046 nvc_info.c:174] selecting /usr/lib64/libnvoptix.so.520.56.06
I0618 14:02:28.293955 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-tls.so.520.56.06
I0618 14:02:28.294039 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-rtcore.so.520.56.06
I0618 14:02:28.294120 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-ptxjitcompiler.so.520.56.06
I0618 14:02:28.294233 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-opticalflow.so.520.56.06
I0618 14:02:28.294345 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-opencl.so.520.56.06
I0618 14:02:28.294438 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-nvvm.so.520.56.06
I0618 14:02:28.294558 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-ngx.so.520.56.06
I0618 14:02:28.294631 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-ml.so.520.56.06
I0618 14:02:28.294744 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-glvkspirv.so.520.56.06
I0618 14:02:28.294817 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-glsi.so.520.56.06
I0618 14:02:28.294884 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-glcore.so.520.56.06
I0618 14:02:28.294959 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-fbc.so.520.56.06
I0618 14:02:28.295064 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-encode.so.520.56.06
I0618 14:02:28.295169 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-eglcore.so.520.56.06
I0618 14:02:28.295252 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-compiler.so.520.56.06
I0618 14:02:28.295326 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-cfg.so.520.56.06
I0618 14:02:28.295441 1957046 nvc_info.c:174] selecting /usr/lib64/libnvidia-allocator.so.520.56.06
I0618 14:02:28.295550 1957046 nvc_info.c:174] selecting /usr/lib64/libnvcuvid.so.520.56.06
I0618 14:02:28.296015 1957046 nvc_info.c:174] selecting /usr/lib64/libcudadebugger.so.520.56.06
I0618 14:02:28.296091 1957046 nvc_info.c:174] selecting /usr/lib64/libcuda.so.520.56.06
I0618 14:02:28.296270 1957046 nvc_info.c:174] selecting /usr/lib64/libGLX_nvidia.so.520.56.06
I0618 14:02:28.296340 1957046 nvc_info.c:174] selecting /usr/lib64/libGLESv2_nvidia.so.520.56.06
I0618 14:02:28.296447 1957046 nvc_info.c:174] selecting /usr/lib64/libGLESv1_CM_nvidia.so.520.56.06
I0618 14:02:28.296541 1957046 nvc_info.c:174] selecting /usr/lib64/libEGL_nvidia.so.520.56.06
I0618 14:02:28.296679 1957046 nvc_info.c:174] selecting /usr/lib/vdpau/libvdpau_nvidia.so.520.56.06
I0618 14:02:28.296802 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-tls.so.520.56.06
I0618 14:02:28.296929 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-ptxjitcompiler.so.520.56.06
I0618 14:02:28.297068 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-opticalflow.so.520.56.06
I0618 14:02:28.297201 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-opencl.so.520.56.06
I0618 14:02:28.297305 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-nvvm.so.520.56.06
I0618 14:02:28.297460 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-ml.so.520.56.06
I0618 14:02:28.297595 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-glvkspirv.so.520.56.06
I0618 14:02:28.297692 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-glsi.so.520.56.06
I0618 14:02:28.297786 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-glcore.so.520.56.06
I0618 14:02:28.297890 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-fbc.so.520.56.06
I0618 14:02:28.298021 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-encode.so.520.56.06
I0618 14:02:28.298148 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-eglcore.so.520.56.06
I0618 14:02:28.298250 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-compiler.so.520.56.06
I0618 14:02:28.298352 1957046 nvc_info.c:174] selecting /usr/lib/libnvidia-allocator.so.520.56.06
I0618 14:02:28.298493 1957046 nvc_info.c:174] selecting /usr/lib/libnvcuvid.so.520.56.06
I0618 14:02:28.298639 1957046 nvc_info.c:174] selecting /usr/lib/libcuda.so.520.56.06
I0618 14:02:28.298786 1957046 nvc_info.c:174] selecting /usr/lib/libGLX_nvidia.so.520.56.06
I0618 14:02:28.298889 1957046 nvc_info.c:174] selecting /usr/lib/libGLESv2_nvidia.so.520.56.06
I0618 14:02:28.298988 1957046 nvc_info.c:174] selecting /usr/lib/libGLESv1_CM_nvidia.so.520.56.06
I0618 14:02:28.299090 1957046 nvc_info.c:174] selecting /usr/lib/libEGL_nvidia.so.520.56.06
W0618 14:02:28.299150 1957046 nvc_info.c:400] missing library libnvidia-nscq.so
W0618 14:02:28.299193 1957046 nvc_info.c:400] missing library libnvidia-fatbinaryloader.so
W0618 14:02:28.299229 1957046 nvc_info.c:400] missing library libnvidia-pkcs11.so
W0618 14:02:28.299264 1957046 nvc_info.c:400] missing library libnvidia-ifr.so
W0618 14:02:28.299294 1957046 nvc_info.c:400] missing library libnvidia-cbl.so
W0618 14:02:28.299332 1957046 nvc_info.c:404] missing compat32 library libnvidia-cfg.so
W0618 14:02:28.299372 1957046 nvc_info.c:404] missing compat32 library libnvidia-nscq.so
W0618 14:02:28.299408 1957046 nvc_info.c:404] missing compat32 library libcudadebugger.so
W0618 14:02:28.299441 1957046 nvc_info.c:404] missing compat32 library libnvidia-fatbinaryloader.so
W0618 14:02:28.299478 1957046 nvc_info.c:404] missing compat32 library libnvidia-pkcs11.so
W0618 14:02:28.299516 1957046 nvc_info.c:404] missing compat32 library libnvidia-ngx.so
W0618 14:02:28.299549 1957046 nvc_info.c:404] missing compat32 library libnvidia-ifr.so
W0618 14:02:28.299587 1957046 nvc_info.c:404] missing compat32 library libnvidia-rtcore.so
W0618 14:02:28.299619 1957046 nvc_info.c:404] missing compat32 library libnvoptix.so
W0618 14:02:28.299654 1957046 nvc_info.c:404] missing compat32 library libnvidia-cbl.so
I0618 14:02:28.300539 1957046 nvc_info.c:300] selecting /usr/bin/nvidia-smi
I0618 14:02:28.300617 1957046 nvc_info.c:300] selecting /usr/bin/nvidia-debugdump
I0618 14:02:28.300691 1957046 nvc_info.c:300] selecting /usr/bin/nvidia-persistenced
I0618 14:02:28.300794 1957046 nvc_info.c:300] selecting /usr/bin/nvidia-cuda-mps-control
I0618 14:02:28.300868 1957046 nvc_info.c:300] selecting /usr/bin/nvidia-cuda-mps-server
W0618 14:02:28.300988 1957046 nvc_info.c:426] missing binary nv-fabricmanager
I0618 14:02:28.301136 1957046 nvc_info.c:486] listing firmware path /lib/firmware/nvidia/520.56.06/gsp.bin
I0618 14:02:28.301230 1957046 nvc_info.c:559] listing device /dev/nvidiactl
I0618 14:02:28.301268 1957046 nvc_info.c:559] listing device /dev/nvidia-uvm
I0618 14:02:28.301298 1957046 nvc_info.c:559] listing device /dev/nvidia-uvm-tools
I0618 14:02:28.301352 1957046 nvc_info.c:559] listing device /dev/nvidia-modeset
I0618 14:02:28.301510 1957046 nvc_info.c:344] listing ipc path /run/nvidia-persistenced/socket
W0618 14:02:28.301585 1957046 nvc_info.c:350] missing ipc path /var/run/nvidia-fabricmanager/socket
W0618 14:02:28.301645 1957046 nvc_info.c:350] missing ipc path /tmp/nvidia-mps
I0618 14:02:28.301682 1957046 nvc_info.c:852] requesting device information with ''
I0618 14:02:28.308576 1957046 nvc_info.c:743] listing device /dev/nvidia0 (GPU-19fcc4ed-dc77-3460-2a87-9f94ac601d7e at 00000000:1a:00.0)
I0618 14:02:28.315317 1957046 nvc_info.c:743] listing device /dev/nvidia1 (GPU-45386311-9683-67b4-7853-2e52f9ad2ae0 at 00000000:1b:00.0)
I0618 14:02:28.321962 1957046 nvc_info.c:743] listing device /dev/nvidia2 (GPU-63c63513-ebe4-987f-240f-98b5ab249321 at 00000000:3d:00.0)
I0618 14:02:28.328784 1957046 nvc_info.c:743] listing device /dev/nvidia3 (GPU-75a5a2c3-cfe3-ce5e-4449-6a6fa59452fb at 00000000:3e:00.0)
I0618 14:02:28.335760 1957046 nvc_info.c:743] listing device /dev/nvidia4 (GPU-d08d0c1e-e2a4-bc51-e55d-1427a31e9aad at 00000000:88:00.0)
I0618 14:02:28.342808 1957046 nvc_info.c:743] listing device /dev/nvidia5 (GPU-da497328-e781-6fbe-206a-a2ab50bdf032 at 00000000:89:00.0)
I0618 14:02:28.349934 1957046 nvc_info.c:743] listing device /dev/nvidia6 (GPU-c47c2363-fe22-d14f-8012-ebe06eaf8d04 at 00000000:b1:00.0)
I0618 14:02:28.357161 1957046 nvc_info.c:743] listing device /dev/nvidia7 (GPU-03a79e44-9d82-d658-f6fd-392738c5a79d at 00000000:b2:00.0)
NVRM version:   520.56.06
CUDA version:   11.8

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce RTX 3080
Brand:          GeForce
GPU UUID:       GPU-19fcc4ed-dc77-3460-2a87-9f94ac601d7e
Bus Location:   00000000:1a:00.0
Architecture:   8.6

Device Index:   1
Device Minor:   1
Model:          NVIDIA GeForce RTX 3080
Brand:          GeForce
GPU UUID:       GPU-45386311-9683-67b4-7853-2e52f9ad2ae0
Bus Location:   00000000:1b:00.0
Architecture:   8.6

Device Index:   2
Device Minor:   2
Model:          NVIDIA GeForce RTX 3080
Brand:          GeForce
GPU UUID:       GPU-63c63513-ebe4-987f-240f-98b5ab249321
Bus Location:   00000000:3d:00.0
Architecture:   8.6

Device Index:   3
Device Minor:   3
Model:          NVIDIA GeForce RTX 3080
Brand:          GeForce
GPU UUID:       GPU-75a5a2c3-cfe3-ce5e-4449-6a6fa59452fb
Bus Location:   00000000:3e:00.0
Architecture:   8.6

Device Index:   4
Device Minor:   4
Model:          NVIDIA GeForce RTX 3080
Brand:          GeForce
GPU UUID:       GPU-d08d0c1e-e2a4-bc51-e55d-1427a31e9aad
Bus Location:   00000000:88:00.0
Architecture:   8.6

Device Index:   5
Device Minor:   5
Model:          NVIDIA GeForce RTX 3080
Brand:          GeForce
GPU UUID:       GPU-da497328-e781-6fbe-206a-a2ab50bdf032
Bus Location:   00000000:89:00.0
Architecture:   8.6

Device Index:   6
Device Minor:   6
Model:          NVIDIA GeForce RTX 3080
Brand:          GeForce
GPU UUID:       GPU-c47c2363-fe22-d14f-8012-ebe06eaf8d04
Bus Location:   00000000:b1:00.0
Architecture:   8.6

Device Index:   7
Device Minor:   7
Model:          NVIDIA GeForce RTX 3080
Brand:          GeForce
GPU UUID:       GPU-03a79e44-9d82-d658-f6fd-392738c5a79d
Bus Location:   00000000:b2:00.0
Architecture:   8.6
I0618 14:02:28.358057 1957046 nvc.c:434] shutting down library context
I0618 14:02:28.358117 1957049 rpc.c:95] terminating nvcgo rpc service
I0618 14:02:28.359020 1957046 rpc.c:135] nvcgo rpc service terminated successfully
I0618 14:02:28.362141 1957048 rpc.c:95] terminating driver rpc service
I0618 14:02:28.362417 1957046 rpc.c:135] driver rpc service terminated successfully