NVIDIA / nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs
Apache License 2.0
17.17k stars 2.03k forks source link

/usr/bin/nvidia-smi: No such file or directory #1753

Closed Adonis-Song closed 1 year ago

Adonis-Song commented 1 year ago

1. Issue or feature description

When I use nvidia-smi to get gpu info, I get this error that no such file or directory.

2. Steps to reproduce the issue

I using Zabbix Agent2 to listen gpu info, docker comand is

sudo docker run -v /etc/localtime:/etc/localtime \
        --name zabbix-agent2 \
        -e ZBX_HOSTNAME="zabbix-agent2" \
        -e ZBX_SERVER_HOST="*" \
        -p 10050:10050 \
        -e ZBX_SERVER_PORT=10051 \
        --gpus all \
        --privileged \
        --restart unless-stopped \
        -d zabbix/zabbix-agent2:alpine-6.0-latest

and step into docker to run nvidia-smi.

3. Information to attach (optional if deemed irrelevant)

I0508 06:47:09.339989 38408 nvc.c:376] initializing library context (version=1.13.1, build=6f4aea0fca16aaff01bab2567adb34ec30847a0e)

I0508 06:47:09.340044 38408 nvc.c:350] using root /
I0508 06:47:09.340056 38408 nvc.c:351] using ldcache /etc/ld.so.cache
I0508 06:47:09.340066 38408 nvc.c:352] using unprivileged user 1000:1000
I0508 06:47:09.340098 38408 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0508 06:47:09.340299 38408 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment
W0508 06:47:09.342696 38412 nvc.c:273] failed to set inheritable capabilities
W0508 06:47:09.342777 38412 nvc.c:274] skipping kernel modules load due to failure
I0508 06:47:09.343210 38413 rpc.c:71] starting driver rpc service
I0508 06:47:09.352881 38414 rpc.c:71] starting nvcgo rpc service
I0508 06:47:09.354471 38408 nvc_info.c:796] requesting driver information with ''
I0508 06:47:09.356864 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.530.41.03
I0508 06:47:09.357003 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.530.41.03
I0508 06:47:09.357112 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.530.41.03
I0508 06:47:09.357214 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.530.41.03
I0508 06:47:09.357338 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.530.41.03
I0508 06:47:09.357463 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.530.41.03
I0508 06:47:09.357543 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-nvvm.so.530.41.03
I0508 06:47:09.357666 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.530.41.03
I0508 06:47:09.357776 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.530.41.03
I0508 06:47:09.357916 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.530.41.03
I0508 06:47:09.358014 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.530.41.03
I0508 06:47:09.358158 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.530.41.03
I0508 06:47:09.358249 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.530.41.03
I0508 06:47:09.358349 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.530.41.03
I0508 06:47:09.358470 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.530.41.03
I0508 06:47:09.358579 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.530.41.03
I0508 06:47:09.358689 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.530.41.03
I0508 06:47:09.358820 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.530.41.03
I0508 06:47:09.358973 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.530.41.03
I0508 06:47:09.359652 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libcudadebugger.so.530.41.03
I0508 06:47:09.359770 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.530.41.03
I0508 06:47:09.360192 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.530.41.03
I0508 06:47:09.360292 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.530.41.03
I0508 06:47:09.360389 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.530.41.03
I0508 06:47:09.360499 38408 nvc_info.c:174] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.530.41.03
I0508 06:47:09.360655 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvidia-tls.so.530.41.03
I0508 06:47:09.360776 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.530.41.03
I0508 06:47:09.360927 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.530.41.03
I0508 06:47:09.361079 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvidia-opencl.so.530.41.03
I0508 06:47:09.361181 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvidia-nvvm.so.530.41.03
I0508 06:47:09.361347 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvidia-ml.so.530.41.03
I0508 06:47:09.361496 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvidia-glvkspirv.so.530.41.03
I0508 06:47:09.361592 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvidia-glsi.so.530.41.03
I0508 06:47:09.361692 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvidia-glcore.so.530.41.03
I0508 06:47:09.361791 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvidia-fbc.so.530.41.03
I0508 06:47:09.361951 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvidia-encode.so.530.41.03
I0508 06:47:09.362085 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvidia-eglcore.so.530.41.03
I0508 06:47:09.362208 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvidia-compiler.so.530.41.03
I0508 06:47:09.362311 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libnvcuvid.so.530.41.03
I0508 06:47:09.362494 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libcuda.so.530.41.03
I0508 06:47:09.362664 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libGLX_nvidia.so.530.41.03
I0508 06:47:09.362760 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libGLESv2_nvidia.so.530.41.03
I0508 06:47:09.362859 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libGLESv1_CM_nvidia.so.530.41.03
I0508 06:47:09.362981 38408 nvc_info.c:174] selecting /usr/lib/i386-linux-gnu/libEGL_nvidia.so.530.41.03
W0508 06:47:09.363081 38408 nvc_info.c:400] missing library libnvidia-nscq.so
W0508 06:47:09.363102 38408 nvc_info.c:400] missing library libnvidia-fatbinaryloader.so
W0508 06:47:09.363119 38408 nvc_info.c:400] missing library libnvidia-pkcs11.so
W0508 06:47:09.363148 38408 nvc_info.c:400] missing library libvdpau_nvidia.so
W0508 06:47:09.363165 38408 nvc_info.c:400] missing library libnvidia-ifr.so
W0508 06:47:09.363180 38408 nvc_info.c:400] missing library libnvidia-cbl.so
W0508 06:47:09.363196 38408 nvc_info.c:404] missing compat32 library libnvidia-cfg.so
W0508 06:47:09.363211 38408 nvc_info.c:404] missing compat32 library libnvidia-nscq.so
W0508 06:47:09.363228 38408 nvc_info.c:404] missing compat32 library libcudadebugger.so
W0508 06:47:09.363244 38408 nvc_info.c:404] missing compat32 library libnvidia-fatbinaryloader.so
W0508 06:47:09.363259 38408 nvc_info.c:404] missing compat32 library libnvidia-allocator.so
W0508 06:47:09.363275 38408 nvc_info.c:404] missing compat32 library libnvidia-pkcs11.so
W0508 06:47:09.363290 38408 nvc_info.c:404] missing compat32 library libnvidia-ngx.so
W0508 06:47:09.363306 38408 nvc_info.c:404] missing compat32 library libvdpau_nvidia.so
W0508 06:47:09.363322 38408 nvc_info.c:404] missing compat32 library libnvidia-ifr.so
W0508 06:47:09.363337 38408 nvc_info.c:404] missing compat32 library libnvidia-rtcore.so
W0508 06:47:09.363353 38408 nvc_info.c:404] missing compat32 library libnvoptix.so
W0508 06:47:09.363369 38408 nvc_info.c:404] missing compat32 library libnvidia-cbl.so
I0508 06:47:09.364364 38408 nvc_info.c:300] selecting /usr/bin/nvidia-smi
I0508 06:47:09.364418 38408 nvc_info.c:300] selecting /usr/bin/nvidia-debugdump
I0508 06:47:09.364460 38408 nvc_info.c:300] selecting /usr/bin/nvidia-persistenced
I0508 06:47:09.364525 38408 nvc_info.c:300] selecting /usr/bin/nvidia-cuda-mps-control
I0508 06:47:09.364567 38408 nvc_info.c:300] selecting /usr/bin/nvidia-cuda-mps-server
W0508 06:47:09.364696 38408 nvc_info.c:426] missing binary nv-fabricmanager
I0508 06:47:09.364787 38408 nvc_info.c:486] listing firmware path /lib/firmware/nvidia/530.41.03/gsp_ga10x.bin
I0508 06:47:09.364807 38408 nvc_info.c:486] listing firmware path /lib/firmware/nvidia/530.41.03/gsp_tu10x.bin
I0508 06:47:09.364867 38408 nvc_info.c:559] listing device /dev/nvidiactl
I0508 06:47:09.364884 38408 nvc_info.c:559] listing device /dev/nvidia-uvm
I0508 06:47:09.364900 38408 nvc_info.c:559] listing device /dev/nvidia-uvm-tools
I0508 06:47:09.364915 38408 nvc_info.c:559] listing device /dev/nvidia-modeset
I0508 06:47:09.364976 38408 nvc_info.c:344] listing ipc path /run/nvidia-persistenced/socket
W0508 06:47:09.365029 38408 nvc_info.c:350] missing ipc path /var/run/nvidia-fabricmanager/socket
W0508 06:47:09.365068 38408 nvc_info.c:350] missing ipc path /tmp/nvidia-mps
I0508 06:47:09.365087 38408 nvc_info.c:852] requesting device information with ''
I0508 06:47:09.372086 38408 nvc_info.c:743] listing device /dev/nvidia0 (GPU-e2c56be3-43ce-f0ae-e02b-e0c6d1977e26 at 00000000:01:00.0)
NVRM version:   530.41.03
CUDA version:   12.1

Device Index:   0
Device Minor:   0
Model:          NVIDIA GeForce GTX 960
Brand:          GeForce
GPU UUID:       GPU-e2c56be3-43ce-f0ae-e02b-e0c6d1977e26
Bus Location:   00000000:01:00.0
Architecture:   5.2
I0508 06:47:09.372163 38408 nvc.c:434] shutting down library context
I0508 06:47:09.372213 38414 rpc.c:95] terminating nvcgo rpc service
I0508 06:47:09.372884 38408 rpc.c:135] nvcgo rpc service terminated successfully
I0508 06:47:09.376586 38413 rpc.c:95] terminating driver rpc service
I0508 06:47:09.376870 38408 rpc.c:135] driver rpc service terminated successfully

Attached GPUs : 1 GPU 00000000:01:00.0 Product Name : NVIDIA GeForce GTX 960 Product Brand : GeForce Product Architecture : Maxwell Display Mode : Enabled Display Active : Disabled Persistence Mode : Disabled MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : N/A Pending : N/A Serial Number : N/A GPU UUID : GPU-e2c56be3-43ce-f0ae-e02b-e0c6d1977e26 Minor Number : 0 VBIOS Version : 84.06.26.00.22 MultiGPU Board : No Board ID : 0x100 Board Part Number : N/A GPU Part Number : 1401-300-A1 FRU Part Number : N/A Module ID : 1 Inforom Version Image Version : N/A OEM Object : N/A ECC Object : N/A Power Management Object : N/A GPU Operation Mode Current : N/A Pending : N/A GSP Firmware Version : N/A GPU Virtualization Mode Virtualization Mode : None Host VGPU Mode : N/A GPU Reset Status Reset Required : N/A Drain and Reset Recommended : N/A IBMNPU Relaxed Ordering Mode : N/A PCI Bus : 0x01 Device : 0x00 Domain : 0x0000 Device Id : 0x140110DE Bus Id : 00000000:01:00.0 Sub System Id : 0x00007377 GPU Link Info PCIe Generation Max : 2 Current : 1 Device Current : 1 Device Max : 3 Host Max : 3 Link Width Max : 16x Current : 8x Bridge Chip Type : N/A Firmware : N/A Replays Since Reset : 270 Replay Number Rollovers : 0 Tx Throughput : 0 KB/s Rx Throughput : 0 KB/s Atomic Caps Inbound : N/A Atomic Caps Outbound : N/A Fan Speed : 29 % Performance State : P8 Clocks Throttle Reasons Idle : Active Applications Clocks Setting : Not Active SW Power Cap : Not Active HW Slowdown : Not Active HW Thermal Slowdown : N/A HW Power Brake Slowdown : N/A Sync Boost : Not Active SW Thermal Slowdown : Not Active Display Clock Setting : Not Active FB Memory Usage Total : 4096 MiB Reserved : 59 MiB Used : 46 MiB Free : 3989 MiB BAR1 Memory Usage Total : 256 MiB Used : 5 MiB Free : 251 MiB Compute Mode : Default Utilization Gpu : 0 % Memory : 0 % Encoder : 0 % Decoder : 0 % Encoder Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 FBC Stats Active Sessions : 0 Average FPS : 0 Average Latency : 0 ECC Mode Current : N/A Pending : N/A ECC Errors Volatile Single Bit
Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit
Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Aggregate Single Bit
Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Double Bit
Device Memory : N/A Register File : N/A L1 Cache : N/A L2 Cache : N/A Texture Memory : N/A Texture Shared : N/A CBU : N/A Total : N/A Retired Pages Single Bit ECC : N/A Double Bit ECC : N/A Pending Page Blacklist : N/A Remapped Rows : N/A Temperature GPU Current Temp : 32 C GPU Shutdown Temp : 101 C GPU Slowdown Temp : 96 C GPU Max Operating Temp : N/A GPU Target Temperature : 80 C Memory Current Temp : N/A Memory Max Operating Temp : N/A Power Readings Power Management : Supported Power Draw : 14.56 W Power Limit : 140.00 W Default Power Limit : 140.00 W Enforced Power Limit : 140.00 W Min Power Limit : 60.00 W Max Power Limit : 154.00 W Clocks Graphics : 135 MHz SM : 135 MHz Memory : 405 MHz Video : 405 MHz Applications Clocks Graphics : 1139 MHz Memory : 3505 MHz Default Applications Clocks Graphics : 1139 MHz Memory : 3505 MHz Deferred Clocks Memory : N/A Max Clocks Graphics : 1405 MHz SM : 1405 MHz Memory : 3505 MHz Video : 1152 MHz Max Customer Boost Clocks Graphics : N/A Clock Policy Auto Boost : N/A Auto Boost Default : N/A Voltage Graphics : N/A Fabric State : N/A Status : N/A Processes GPU instance ID : N/A Compute instance ID : N/A Process ID : 1339 Type : G Name : /usr/lib/xorg/Xorg Used GPU Memory : 43 MiB

 - [ ] Docker version from `docker version`

Client: Docker Engine - Community Version: 20.10.2 API version: 1.41 Go version: go1.13.15 Git commit: 2291f61 Built: Mon Dec 28 16:17:32 2020 OS/Arch: linux/amd64 Context: default Experimental: true

 - [ ] NVIDIA packages version from `dpkg -l '*nvidia*'` _or_ `rpm -qa '*nvidia*'`

Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-=======================-================-================-=================================================== un libgldispatch0-nvidia (no description available) ii libnvidia-cfg1-530:amd6 530.41.03-0ubunt amd64 NVIDIA binary OpenGL/GLX configuration library un libnvidia-cfg1-any (no description available) un libnvidia-common (no description available) ii libnvidia-common-530 530.41.03-0ubunt all Shared files used by the NVIDIA libraries un libnvidia-compute (no description available) ii libnvidia-compute-530:a 530.41.03-0ubunt amd64 NVIDIA libcompute package ii libnvidia-compute-530:i 530.41.03-0ubunt i386 NVIDIA libcompute package ii libnvidia-container-too 1.13.1-1 amd64 NVIDIA container runtime library (command-line tool ii libnvidia-container1:am 1.13.1-1 amd64 NVIDIA container runtime library un libnvidia-decode (no description available) ii libnvidia-decode-530:am 530.41.03-0ubunt amd64 NVIDIA Video Decoding runtime libraries ii libnvidia-decode-530:i3 530.41.03-0ubunt i386 NVIDIA Video Decoding runtime libraries un libnvidia-encode (no description available) ii libnvidia-encode-530:am 530.41.03-0ubunt amd64 NVENC Video Encoding runtime library ii libnvidia-encode-530:i3 530.41.03-0ubunt i386 NVENC Video Encoding runtime library un libnvidia-extra (no description available) ii libnvidia-extra-530:amd 530.41.03-0ubunt amd64 Extra libraries for the NVIDIA driver un libnvidia-fbc1 (no description available) ii libnvidia-fbc1-530:amd6 530.41.03-0ubunt amd64 NVIDIA OpenGL-based Framebuffer Capture runtime lib ii libnvidia-fbc1-530:i386 530.41.03-0ubunt i386 NVIDIA OpenGL-based Framebuffer Capture runtime lib un libnvidia-gl (no description available) ii libnvidia-gl-530:amd64 530.41.03-0ubunt amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulk ii libnvidia-gl-530:i386 530.41.03-0ubunt i386 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulk un libnvidia-ml1 (no description available) un nvidia-304 (no description available) un nvidia-340 (no description available) un nvidia-384 (no description available) un nvidia-390 (no description available) un nvidia-common (no description available) un nvidia-compute-utils (no description available) ii nvidia-compute-utils-53 530.41.03-0ubunt amd64 NVIDIA compute utilities ii nvidia-container-runtim 3.13.0-1 all NVIDIA container runtime un nvidia-container-runtim (no description available) ii nvidia-container-toolki 1.13.1-1 amd64 NVIDIA Container toolkit ii nvidia-container-toolki 1.13.1-1 amd64 NVIDIA Container Toolkit Base ii nvidia-dkms-530 530.41.03-0ubunt amd64 NVIDIA DKMS package un nvidia-dkms-kernel (no description available) ii nvidia-driver-530 530.41.03-0ubunt amd64 NVIDIA driver metapackage un nvidia-driver-binary (no description available) un nvidia-kernel-common (no description available) ii nvidia-kernel-common-53 530.41.03-0ubunt amd64 Shared files used with the kernel module un nvidia-kernel-source (no description available) ii nvidia-kernel-source-53 530.41.03-0ubunt amd64 NVIDIA kernel source package un nvidia-legacy-340xx-vdp (no description available) un nvidia-opencl-icd (no description available) un nvidia-persistenced (no description available) ii nvidia-prime 0.8.16~0.18.04.1 all Tools to enable NVIDIA's Prime ii nvidia-settings 470.57.01-0ubunt amd64 Tool for configuring the NVIDIA graphics driver un nvidia-settings-binary (no description available) un nvidia-smi (no description available) un nvidia-utils (no description available) ii nvidia-utils-530 530.41.03-0ubunt amd64 NVIDIA driver support binaries un nvidia-vdpau-driver (no description available) ii xserver-xorg-video-nvid 530.41.03-0ubunt amd64 NVIDIA binary Xorg driver

elezar commented 1 year ago

Does running nvidia-smi in a different container work as expected?

sudo docker run --rm -ti --gpus=all ubuntu nvidia-smi -L
Adonis-Song commented 1 year ago

The verison of ldd in zabbix agent is smaller than host, not bug