Open valentindbdg opened 2 years ago
@valentindbdg if nvidia-smi
doesn't work on the host this indicates that the driver isn't installed correctly. Could you try reinstalling the driver and confirming that nvidia-smi
works on the host system before launching a container?
@elezar Hello, I have a question: can I install nvidia driver successfully without a GPU on my laptop? Some libs need the driver, but I just want to run python codes with these libs in CPU mode. So I wonder if nvidia-smi
will successfully shows if I don't have a physical GPU on my laptop. Thanks~
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
gives me:docker: Error response from daemon: failed to create shim: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: nvml error: driver not loaded: unknown.
2. Steps to reproduce the issue
3. Information to attach (optional if deemed irrelevant)
[ ] Some nvidia-container information:
nvidia-container-cli -k -d /dev/tty info
I0418 18:12:58.648174 217303 nvc.c:376] initializing library context (version=1.9.0, build=5e135c17d6dbae861ec343e9a8d3a0d2af758a4f) I0418 18:12:58.648245 217303 nvc.c:350] using root / I0418 18:12:58.648257 217303 nvc.c:351] using ldcache /etc/ld.so.cache I0418 18:12:58.648268 217303 nvc.c:352] using unprivileged user 1000:1000 I0418 18:12:58.648300 217303 nvc.c:393] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL) I0418 18:12:58.648518 217303 nvc.c:395] dxcore initialization failed, continuing assuming a non-WSL environment W0418 18:12:59.164853 217306 nvc.c:273] failed to set inheritable capabilities W0418 18:12:59.164916 217306 nvc.c:274] skipping kernel modules load due to failure I0418 18:12:59.165320 217307 rpc.c:71] starting driver rpc service I0418 18:12:59.183603 217303 rpc.c:135] driver rpc service terminated with signal 15 nvidia-container-cli: initialization error: nvml error: driver not loaded I0418 18:12:59.183703 217303 nvc.c:430] shutting down library context
[ ] Kernel version from
uname -a
Linux valentin-P37V4 5.13.0-39-generic NVIDIA/nvidia-docker#44~20.04.1-Ubuntu SMP Thu Mar 24 16:43:35 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux[ ] Any relevant kernel output lines from
dmesg
[ ] Driver information from
nvidia-smi -a
NVIDIA-SMI has failed because it couldn't communicate with the NVIDIA driver. Make sure that the latest NVIDIA driver is installed and running.[ ] Docker version from
docker version
Client: Docker Engine - Community Version: 20.10.14 API version: 1.41 Go version: go1.16.15 Git commit: a224086 Built: Thu Mar 24 01:48:02 2022 OS/Arch: linux/amd64 Context: default Experimental: trueServer: Docker Engine - Community Engine: Version: 20.10.14 API version: 1.41 (minimum version 1.12) Go version: go1.16.15 Git commit: 87a90dc Built: Thu Mar 24 01:45:53 2022 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.5.11 GitCommit: 3df54a852345ae127d1fa3092b95168e4a88e2f8 runc: Version: 1.0.3 GitCommit: v1.0.3-0-gf46b6ba docker-init: Version: 0.19.0 GitCommit: de40ad0
[ ] NVIDIA packages version from >
ii libnvidia-cfg1-470:amd64 470.103.01-0ubuntu0.20.04.1 amd64 >
un libnvidia-cfg1-any >
un libnvidia-common >
ii libnvidia-common-470 470.103.01-0ubuntu0.20.04.1 all >
un libnvidia-compute >
rc libnvidia-compute-418-server:amd64 418.226.00-0ubuntu0.20.04.2 amd64 >
ii libnvidia-compute-470:amd64 470.103.01-0ubuntu0.20.04.1 amd64 >
ii libnvidia-compute-470:i386 470.103.01-0ubuntu0.20.04.1 i386 >
ii libnvidia-container-tools 1.9.0-1 amd64 >
ii libnvidia-container1:amd64 1.9.0-1 amd64 >
un libnvidia-decode >
ii libnvidia-decode-470:amd64 470.103.01-0ubuntu0.20.04.1 amd64 >
ii libnvidia-decode-470:i386 470.103.01-0ubuntu0.20.04.1 i386 >
un libnvidia-encode >
ii libnvidia-encode-470:amd64 470.103.01-0ubuntu0.20.04.1 amd64 >
ii libnvidia-encode-470:i386 470.103.01-0ubuntu0.20.04.1 i386 >
un libnvidia-extra >
lines 1-23dpkg-query: no packages found matching nvidiarpm
dpkg-query: no packages found matching -qa
ii libnvidia-extra-470:amd64 470.103.01-0ubuntu0.20.04.1 amd64 >
un libnvidia-fbc1 >
ii libnvidia-fbc1-470:amd64 470.103.01-0ubuntu0.20.04.1 amd64 >
ii libnvidia-fbc1-470:i386 470.103.01-0ubuntu0.20.04.1 i386 >
un libnvidia-gl >
un libnvidia-gl-390 >
un libnvidia-gl-435 >
un libnvidia-gl-440 >
ii libnvidia-gl-470:amd64 470.103.01-0ubuntu0.20.04.1 amd64 >
ii libnvidia-gl-470:i386 470.103.01-0ubuntu0.20.04.1 i386 >
un libnvidia-ifr1 >
ii libnvidia-ifr1-470:amd64 470.103.01-0ubuntu0.20.04.1 amd64 >
ii libnvidia-ifr1-470:i386 470.103.01-0ubuntu0.20.04.1 i386 >
un libnvidia-ml1 >
un nvidia-384 >
un nvidia-390 >
un nvidia-common >
un nvidia-compute-utils >
rc nvidia-compute-utils-418-server 418.226.00-0ubuntu0.20.04.2 amd64 >
ii nvidia-compute-utils-470 470.103.01-0ubuntu0.20.04.1 amd64 >
un nvidia-container-runtime >
un nvidia-container-runtime-hook >
ii nvidia-container-toolkit 1.9.0-1 amd64 >
rc nvidia-dkms-418-server 418.226.00-0ubuntu0.20.04.2 amd64 >
ii nvidia-dkms-470 470.103.01-0ubuntu0.20.04.1 amd64 >
un nvidia-dkms-kernel >
un nvidia-docker >
ii nvidia-docker2 2.10.0-1 all >
ii nvidia-driver-470 470.103.01-0ubuntu0.20.04.1 amd64 >
un nvidia-driver-binary >
un nvidia-kernel-common >
rc nvidia-kernel-common-418-server 418.226.00-0ubuntu0.20.04.2 amd64 >
ii nvidia-kernel-common-470 470.103.01-0ubuntu0.20.04.1 amd64 >
un nvidia-kernel-source >
un nvidia-kernel-source-418-server >
ii nvidia-kernel-source-470 470.103.01-0ubuntu0.20.04.1 amd64 >
un nvidia-legacy-304xx-vdpau-driver >
un nvidia-legacy-340xx-vdpau-driver >
un nvidia-libopencl1-dev >
un nvidia-opencl-icd >
un nvidia-persistenced >
ii nvidia-prime 0.8.16~0.20.04.2 all >
ii nvidia-settings 470.57.01-0ubuntu0.20.04.3 amd64 >
un nvidia-settings-binary >
un nvidia-smi >
un nvidia-utils >
ii nvidia-utils-470 470.103.01-0ubuntu0.20.04.1 amd64 >
un nvidia-vdpau-driver >
ii xserver-xorg-video-nvidia-470 470.103.01-0ubuntu0.20.04.1 amd64
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture> +++-==================================-===========================-============> un libgldispatch0-nvidia[ ] NVIDIA container library version from
nvidia-container-cli -V
cli-version: 1.9.0 lib-version: 1.9.0 build date: 2022-03-18T13:46+00:00 build revision: 5e135c17d6dbae861ec343e9a8d3a0d2af758a4f build compiler: x86_64-linux-gnu-gcc-7 7.5.0 build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fplan9-extensions -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections[ ] NVIDIA container library logs (see troubleshooting)
[ ] Docker command, image and tag used