Open jzhang82119 opened 3 years ago
docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: nvml error: unknown error: unknown.
nvidia-docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: nvml error: unknown error: unknown.
nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I0528 01:01:28.078540 17674 nvc.c:372] initializing library context (version=1.4.0, build=704a698b7a0ceec07a48e56c37365c741718c2df) I0528 01:01:28.078613 17674 nvc.c:346] using root / I0528 01:01:28.078621 17674 nvc.c:347] using ldcache /etc/ld.so.cache I0528 01:01:28.078626 17674 nvc.c:348] using unprivileged user 65534:65534 I0528 01:01:28.078662 17674 nvc.c:389] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I0528 01:01:28.078780 17674 nvc.c:391] dxcore initialization failed, continuing assuming a non-WSL environment I0528 01:01:28.082736 17675 nvc.c:274] loading kernel module nvidia I0528 01:01:28.082970 17675 nvc.c:278] running mknod for /dev/nvidiactl I0528 01:01:28.083014 17675 nvc.c:282] running mknod for /dev/nvidia0 I0528 01:01:28.083043 17675 nvc.c:282] running mknod for /dev/nvidia1 I0528 01:01:28.083066 17675 nvc.c:282] running mknod for /dev/nvidia2 I0528 01:01:28.083088 17675 nvc.c:282] running mknod for /dev/nvidia3 I0528 01:01:28.083108 17675 nvc.c:286] running mknod for all nvcaps in /dev/nvidia-caps I0528 01:01:28.085774 17675 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap1 from /proc/driver/nvidia/capabilities/mig/config I0528 01:01:28.085923 17675 nvc.c:214] running mknod for /dev/nvidia-caps/nvidia-cap2 from /proc/driver/nvidia/capabilities/mig/monitor I0528 01:01:28.089810 17675 nvc.c:292] loading kernel module nvidia_uvm I0528 01:01:28.089841 17675 nvc.c:296] running mknod for /dev/nvidia-uvm I0528 01:01:28.089938 17675 nvc.c:301] loading kernel module nvidia_modeset I0528 01:01:28.090008 17675 nvc.c:305] running mknod for /dev/nvidia-modeset I0528 01:01:28.090360 17676 driver.c:101] starting driver service I0528 01:01:47.378772 17674 nvc_info.c:676] requesting driver information with '' I0528 01:01:47.380902 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.455.23.05 I0528 01:01:47.381134 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.455.23.05 I0528 01:01:47.381226 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.455.23.05 I0528 01:01:47.381269 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.455.23.05 I0528 01:01:47.381314 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.455.23.05 I0528 01:01:47.381374 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.455.23.05 I0528 01:01:47.381432 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.455.23.05 I0528 01:01:47.381467 17674 nvc_info.c:171] skipping /usr/lib/x86_64-linux-gnu/libnvidia-nscq-dcgm.so.450.51.06 I0528 01:01:47.381506 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ngx.so.455.23.05 I0528 01:01:47.381540 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.455.23.05 I0528 01:01:47.381588 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.455.23.05 I0528 01:01:47.381637 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.455.23.05 I0528 01:01:47.381672 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.455.23.05 I0528 01:01:47.381706 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.455.23.05 I0528 01:01:47.381740 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.455.23.05 I0528 01:01:47.381791 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.455.23.05 I0528 01:01:47.381842 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.455.23.05 I0528 01:01:47.381877 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.455.23.05 I0528 01:01:47.381908 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.455.23.05 I0528 01:01:47.381952 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.455.23.05 I0528 01:01:47.381991 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvidia-allocator.so.455.23.05 I0528 01:01:47.382049 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.455.23.05 I0528 01:01:47.382433 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.455.23.05 I0528 01:01:47.382638 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.455.23.05 I0528 01:01:47.382677 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.455.23.05 I0528 01:01:47.382712 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.455.23.05 I0528 01:01:47.382750 17674 nvc_info.c:169] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.455.23.05 W0528 01:01:47.382779 17674 nvc_info.c:350] missing library libnvidia-nscq.so W0528 01:01:47.382787 17674 nvc_info.c:350] missing library libnvidia-fatbinaryloader.so W0528 01:01:47.382802 17674 nvc_info.c:354] missing compat32 library libnvidia-ml.so W0528 01:01:47.382810 17674 nvc_info.c:354] missing compat32 library libnvidia-cfg.so W0528 01:01:47.382817 17674 nvc_info.c:354] missing compat32 library libnvidia-nscq.so W0528 01:01:47.382824 17674 nvc_info.c:354] missing compat32 library libcuda.so W0528 01:01:47.382831 17674 nvc_info.c:354] missing compat32 library libnvidia-opencl.so W0528 01:01:47.382839 17674 nvc_info.c:354] missing compat32 library libnvidia-ptxjitcompiler.so W0528 01:01:47.382850 17674 nvc_info.c:354] missing compat32 library libnvidia-fatbinaryloader.so W0528 01:01:47.382859 17674 nvc_info.c:354] missing compat32 library libnvidia-allocator.so W0528 01:01:47.382866 17674 nvc_info.c:354] missing compat32 library libnvidia-compiler.so W0528 01:01:47.382873 17674 nvc_info.c:354] missing compat32 library libnvidia-ngx.so W0528 01:01:47.382880 17674 nvc_info.c:354] missing compat32 library libvdpau_nvidia.so W0528 01:01:47.382888 17674 nvc_info.c:354] missing compat32 library libnvidia-encode.so W0528 01:01:47.382894 17674 nvc_info.c:354] missing compat32 library libnvidia-opticalflow.so W0528 01:01:47.382902 17674 nvc_info.c:354] missing compat32 library libnvcuvid.so W0528 01:01:47.382909 17674 nvc_info.c:354] missing compat32 library libnvidia-eglcore.so W0528 01:01:47.382916 17674 nvc_info.c:354] missing compat32 library libnvidia-glcore.so W0528 01:01:47.382923 17674 nvc_info.c:354] missing compat32 library libnvidia-tls.so W0528 01:01:47.382931 17674 nvc_info.c:354] missing compat32 library libnvidia-glsi.so W0528 01:01:47.382940 17674 nvc_info.c:354] missing compat32 library libnvidia-fbc.so W0528 01:01:47.382947 17674 nvc_info.c:354] missing compat32 library libnvidia-ifr.so W0528 01:01:47.382956 17674 nvc_info.c:354] missing compat32 library libnvidia-rtcore.so W0528 01:01:47.382965 17674 nvc_info.c:354] missing compat32 library libnvoptix.so W0528 01:01:47.382974 17674 nvc_info.c:354] missing compat32 library libGLX_nvidia.so W0528 01:01:47.382982 17674 nvc_info.c:354] missing compat32 library libEGL_nvidia.so W0528 01:01:47.382989 17674 nvc_info.c:354] missing compat32 library libGLESv2_nvidia.so W0528 01:01:47.383004 17674 nvc_info.c:354] missing compat32 library libGLESv1_CM_nvidia.so W0528 01:01:47.383015 17674 nvc_info.c:354] missing compat32 library libnvidia-glvkspirv.so W0528 01:01:47.383027 17674 nvc_info.c:354] missing compat32 library libnvidia-cbl.so I0528 01:01:47.383290 17674 nvc_info.c:276] selecting /usr/bin/nvidia-smi I0528 01:01:47.383311 17674 nvc_info.c:276] selecting /usr/bin/nvidia-debugdump I0528 01:01:47.383331 17674 nvc_info.c:276] selecting /usr/bin/nvidia-persistenced I0528 01:01:47.383364 17674 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-control I0528 01:01:47.383383 17674 nvc_info.c:276] selecting /usr/bin/nvidia-cuda-mps-server W0528 01:01:47.383421 17674 nvc_info.c:376] missing binary nv-fabricmanager I0528 01:01:47.383449 17674 nvc_info.c:438] listing device /dev/nvidiactl I0528 01:01:47.383456 17674 nvc_info.c:438] listing device /dev/nvidia-uvm I0528 01:01:47.383464 17674 nvc_info.c:438] listing device /dev/nvidia-uvm-tools I0528 01:01:47.383470 17674 nvc_info.c:438] listing device /dev/nvidia-modeset W0528 01:01:47.383503 17674 nvc_info.c:321] missing ipc /var/run/nvidia-persistenced/socket W0528 01:01:47.383525 17674 nvc_info.c:321] missing ipc /var/run/nvidia-fabricmanager/socket W0528 01:01:47.383543 17674 nvc_info.c:321] missing ipc /tmp/nvidia-mps I0528 01:01:47.383551 17674 nvc_info.c:733] requesting device information with '' nvidia-container-cli: detection error: nvml error: unknown error I0528 01:01:47.389646 17674 nvc.c:423] shutting down library context I0528 01:01:48.385932 17676 driver.c:163] terminating driver service I0528 01:01:48.386385 17674 driver.c:203] driver service terminated successfully
Kernel version 5.4.0-73-generic
docker version
docker version
Client:
Version: 20.10.2
API version: 1.41
Go version: go1.13.8
Git commit: 20.10.2-0ubuntu1~18.04.2
Built: Tue Mar 30 21:24:16 2021
OS/Arch: linux/amd64
Context: default
Experimental: trueServer: Engine: Version: 20.10.2 API version: 1.41 (minimum version 1.12) Go version: go1.13.8 Git commit: 20.10.2-0ubuntu1~18.04.2 Built: Mon Mar 29 19:27:41 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.3.3-0ubuntu1~18.04.4 GitCommit: runc: Version: spec: 1.0.2-dev GitCommit: docker-init: Version: 0.19.0 GitCommit:
I have done reinstalled docker/nvidia-docker. But that does not fix the error.
Still getting the same error.
nvidia-docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: nvml error: unknown error: unknown.
I'm experiencing the exact same error here as well running the same versions, hope this gets sorted soon!
still no update. sad.
It happens on different version of nvidia driver as well 460, 462,470.
reinstalled nvidia drver and docker does not resolve this issue.
@jzhang82119 @cantenna sorry for the late response. Could you also include the output of nvidia-smi
on the host. Is persistence mode enabled on the devices?
@jzhang82119 you mentioned that it was working before. Was there some system update that you executed before you started seeing this behaviour?
@elezar Have a similar issue, described here with logs
Today my nvidia-docker commands stops working. I don't know what problem it is.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: nvml error: unknown error: unknown.
NVIDIA-SMI 455.23.05 Driver Version: 455.23.05 CUDA Version: 11.1
Kernel 5.4.0-73-generic Ubuntu 18.04.5 LTS