Open chunniunai220ml opened 4 years ago
and I reinstall the nvidia-driver by NVIDIA-Linux-x86_64-440.33.01.run, meet the same error.
same problem here on
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
My docker is Docker version 19.03.6, build 369ce74a3c
and I installed nvidia driver from here.
When I run
sudo nvidia-container-cli -k -d /dev/tty info
The output is
I0228 09:13:49.695833 1120 nvc.c:281] initializing library context (version=1.0.7, build=b71f87c04b8eca8a16bf60995506c35c937347d9)
I0228 09:13:49.695933 1120 nvc.c:255] using root /
I0228 09:13:49.695948 1120 nvc.c:256] using ldcache /etc/ld.so.cache
I0228 09:13:49.695958 1120 nvc.c:257] using unprivileged user 65534:65534
I0228 09:13:49.696847 1121 nvc.c:191] loading kernel module nvidia
E0228 09:13:50.186352 1121 nvc.c:193] could not load kernel module nvidia
I0228 09:13:50.186425 1121 nvc.c:203] loading kernel module nvidia_uvm
E0228 09:13:50.628481 1121 nvc.c:205] could not load kernel module nvidia_uvm
I0228 09:13:50.628508 1121 nvc.c:211] loading kernel module nvidia_modeset
E0228 09:13:51.064044 1121 nvc.c:213] could not load kernel module nvidia_modeset
I0228 09:13:51.064251 1129 driver.c:133] starting driver service
I0228 09:13:51.066557 1120 driver.c:233] driver service terminated with signal 15
nvidia-container-cli: initialization error: cuda error: unknown error
the output of my attempt to run
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
is as follows
docker: Error response from daemon: OCI runtime create failed: unable to retrieve OCI runtime error (open /run/containerd/io.containerd.runtime.v1.linux/moby/defdc438de52aef6ec0266539ea834320a9580f75bac6b71cfd2d2e3c999aae9/log.json: no such file or directory): fork/exec /usr/bin/nvidia-container-runtime: no such file or directory: unknown.
ERRO[0000] error waiting for container: context canceled
any idea?
@soheilade have you solved the problem?
yeah, try reinstalling nvidia driver from here and run this docker command to launch carla server in a docker container
docker run -p 2000-2002:2000-2002 --rm -d -it -e NVIDIA_VISIBLE_DEVICES=0 --runtime nvidia carlasim/carla:0.9.5 ./CarlaUE4.sh /Game/Maps/Town01
This point to an error with the driver. Can you install the CUDA samples on the host machine and try to run for example deviceQuery?
@RenaudWasTaken I think I have insalled the driver suceessful, I can use tensorfow1.14.0 in the host machine. and I run commands as follows: 1.cat /proc/driver/nvidia/version, shows:
NVRM version: NVIDIA UNIX x86_64 Kernel Module 440.33.01 Wed Nov 13 00:00:22 UTC 2019 GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11)
2.sudo dpkg --list | grep nvidia-*,shows:
iU libnvidia-container-tools 1.0.7-1 amd64 NVIDIA container runtime library (command-line tools) iU libnvidia-container1:amd64 1.0.7-1 amd64 NVIDIA container runtime library
./deviceQuery Starting...
CUDA Device Query (Runtime API) version (CUDART static linking)
Detected 4 CUDA Capable device(s)
Device 0: "Tesla K80" CUDA Driver Version / Runtime Version 10.2 / 9.0 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11441 MBytes (11996954624 bytes) (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 5 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Device 1: "Tesla K80" CUDA Driver Version / Runtime Version 10.2 / 9.0 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11441 MBytes (11996954624 bytes) (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 6 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Device 2: "Tesla K80" CUDA Driver Version / Runtime Version 10.2 / 9.0 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11441 MBytes (11996954624 bytes) (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 133 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Device 3: "Tesla K80" CUDA Driver Version / Runtime Version 10.2 / 9.0 CUDA Capability Major/Minor version number: 3.7 Total amount of global memory: 11441 MBytes (11996954624 bytes) (13) Multiprocessors, (192) CUDA Cores/MP: 2496 CUDA Cores GPU Max Clock rate: 824 MHz (0.82 GHz) Memory Clock rate: 2505 Mhz Memory Bus Width: 384-bit L2 Cache Size: 1572864 bytes Maximum Texture Dimension Size (x,y,z) 1D=(65536), 2D=(65536, 65536), 3D=(4096, 4096, 4096) Maximum Layered 1D Texture Size, (num) layers 1D=(16384), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(16384, 16384), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 2048 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: No Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Enabled Device supports Unified Addressing (UVA): Yes Supports Cooperative Kernel Launch: No Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 134 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >
Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU1) : Yes Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU2) : No Peer access from Tesla K80 (GPU0) -> Tesla K80 (GPU3) : No Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU0) : Yes Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU2) : No Peer access from Tesla K80 (GPU1) -> Tesla K80 (GPU3) : No Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU0) : No Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU1) : No Peer access from Tesla K80 (GPU2) -> Tesla K80 (GPU3) : Yes Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU0) : No Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU1) : No Peer access from Tesla K80 (GPU3) -> Tesla K80 (GPU2) : Yes
deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 10.2, CUDA Runtime Version = 9.0, NumDevs = 4 Result = PASS
what's wrong with these information? I do not find something.
Ok, nothing wrong with CUDA, the other two that might help are:
I think this is not critical error, and what information should I look at ?
@RenaudWasTaken The problem has not been solved for me, can you give me futher help?
same error occur to me~~~heh
Same problem I am facing as well
Distributor ID: Ubuntu
Description: Ubuntu 18.04.4 LTS
Release: 18.04
Codename: bionic
tx2-01:~$ uname -a
Linux jetson-tx2-01 4.9.140-tegra NVIDIA/nvidia-docker#1 SMP PREEMPT Mon Aug 12 21:29:52 PDT 2019 aarch64 aarch64 aarch64 GNU/Linux
tx2-01:~$ sudo nvidia-container-cli -k -d /dev/tty info [sudo] password for civilmaps:
-- WARNING, the following logs are for debugging purposes only --
I0609 06:28:32.004669 8657 nvc.c:281] initializing library context (version=1.1.1, build=e5d6156aba457559979597c8e3d22c5d8d0622db)
I0609 06:28:32.004901 8657 nvc.c:255] using root /
I0609 06:28:32.004930 8657 nvc.c:256] using ldcache /etc/ld.so.cache
I0609 06:28:32.004947 8657 nvc.c:257] using unprivileged user 65534:65534
W0609 06:28:32.005415 8657 nvc.c:171] failed to detect NVIDIA devices
I0609 06:28:32.005723 8658 nvc.c:191] loading kernel module nvidia
E0609 06:28:32.006013 8658 nvc.c:193] could not load kernel module nvidia
I0609 06:28:32.006037 8658 nvc.c:203] loading kernel module nvidia_uvm
E0609 06:28:32.006142 8658 nvc.c:205] could not load kernel module nvidia_uvm
I0609 06:28:32.006161 8658 nvc.c:211] loading kernel module nvidia_modeset
E0609 06:28:32.006259 8658 nvc.c:213] could not load kernel module nvidia_modeset
I0609 06:28:32.007119 8659 driver.c:101] starting driver service
E0609 06:28:32.009737 8659 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0609 06:28:32.010706 8657 driver.c:196] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request
tx2-01:~$ sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled
Same problem I am facing as well
Distributor ID: Ubuntu Description: Ubuntu 18.04.4 LTS Release: 18.04 Codename: bionic
tx2-01:~$ uname -a
Linux jetson-tx2-01 4.9.140-tegra NVIDIA/nvidia-docker#1 SMP PREEMPT Mon Aug 12 21:29:52 PDT 2019 aarch64 aarch64 aarch64 GNU/Linux
tx2-01:~$ sudo nvidia-container-cli -k -d /dev/tty info [sudo] password for civilmaps:
-- WARNING, the following logs are for debugging purposes only -- I0609 06:28:32.004669 8657 nvc.c:281] initializing library context (version=1.1.1, build=e5d6156aba457559979597c8e3d22c5d8d0622db) I0609 06:28:32.004901 8657 nvc.c:255] using root / I0609 06:28:32.004930 8657 nvc.c:256] using ldcache /etc/ld.so.cache I0609 06:28:32.004947 8657 nvc.c:257] using unprivileged user 65534:65534 W0609 06:28:32.005415 8657 nvc.c:171] failed to detect NVIDIA devices I0609 06:28:32.005723 8658 nvc.c:191] loading kernel module nvidia E0609 06:28:32.006013 8658 nvc.c:193] could not load kernel module nvidia I0609 06:28:32.006037 8658 nvc.c:203] loading kernel module nvidia_uvm E0609 06:28:32.006142 8658 nvc.c:205] could not load kernel module nvidia_uvm I0609 06:28:32.006161 8658 nvc.c:211] loading kernel module nvidia_modeset E0609 06:28:32.006259 8658 nvc.c:213] could not load kernel module nvidia_modeset I0609 06:28:32.007119 8659 driver.c:101] starting driver service E0609 06:28:32.009737 8659 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory I0609 06:28:32.010706 8657 driver.c:196] driver service terminated successfully nvidia-container-cli: initialization error: driver error: failed to process request
tx2-01:~$ sudo docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown. ERRO[0000] error waiting for container: context canceled
@harendracmaps Have you solved your issue? I'm having the same exact error except I'm running it on an NVIDIA Xavier AGX
Running on the following specs:
nvidia@x02:~$ uname -a
Linux x02 4.9.140-tegra NVIDIA/nvidia-docker#1 SMP PREEMPT Mon Dec 9 22:52:02 PST 2019 aarch64 aarch64 aarch64 GNU/Linux
$ cat /etc/nv_tegra_release
# R32 (release), REVISION: 3.1, GCID: 18186506, BOARD: t186ref, EABI: aarch64, DATE: Tue Dec 10 07:03:07 UTC 2019
nvidia@x02:~$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Mon_Mar_11_22:13:24_CDT_2019
Cuda compilation tools, release 10.0, V10.0.326
nvidia@x02:~$ dpkg -l '*nvidia*'
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-================================-=====================-=====================-======================================================================
un libgldispatch0-nvidia <none> <none> (no description available)
ii libnvidia-container-tools 1.2.0-1 arm64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container0:arm64 0.9.0~beta.1 arm64 NVIDIA container runtime library
ii libnvidia-container1:arm64 1.2.0-1 arm64 NVIDIA container runtime library
un nvidia-304 <none> <none> (no description available)
un nvidia-340 <none> <none> (no description available)
un nvidia-384 <none> <none> (no description available)
un nvidia-common <none> <none> (no description available)
ii nvidia-container-csv-cuda 10.0.326-1 arm64 Jetpack CUDA CSV file
ii nvidia-container-csv-cudnn 7.6.3.28-1+cuda10.0 arm64 Jetpack CUDNN CSV file
ii nvidia-container-csv-tensorrt 6.0.1.10-1+cuda10.0 arm64 Jetpack TensorRT CSV file
ii nvidia-container-csv-visionworks 1.6.0.500n arm64 Jetpack VisionWorks CSV file
ii nvidia-container-runtime 3.1.0-1 arm64 NVIDIA container runtime
un nvidia-container-runtime-hook <none> <none> (no description available)
ii nvidia-container-toolkit 1.2.1-1 arm64 NVIDIA container runtime hook
un nvidia-cuda-dev <none> <none> (no description available)
un nvidia-docker <none> <none> (no description available)
ii nvidia-docker2 2.2.0-1 all nvidia-docker CLI wrapper
ii nvidia-l4t-3d-core 32.3.1-20191209230245 arm64 NVIDIA GL EGL Package
ii nvidia-l4t-apt-source 32.3.1-20191209230245 arm64 NVIDIA L4T apt source list debian package
ii nvidia-l4t-bootloader 32.3.1-20191209230245 arm64 NVIDIA Bootloader Package
ii nvidia-l4t-camera 32.3.1-20191209230245 arm64 NVIDIA Camera Package
ii nvidia-l4t-ccp-t186ref 32.3.1-20191209230245 arm64 NVIDIA Compatibility Checking Package
ii nvidia-l4t-configs 32.3.1-20191209230245 arm64 NVIDIA configs debian package
ii nvidia-l4t-core 32.3.1-20191209230245 arm64 NVIDIA Core Package
ii nvidia-l4t-cuda 32.3.1-20191209230245 arm64 NVIDIA CUDA Package
ii nvidia-l4t-firmware 32.3.1-20191209230245 arm64 NVIDIA Firmware Package
ii nvidia-l4t-graphics-demos 32.3.1-20191209230245 arm64 NVIDIA graphics demo applications
ii nvidia-l4t-gstreamer 32.3.1-20191209230245 arm64 NVIDIA GST Application files
ii nvidia-l4t-init 32.3.1-20191209230245 arm64 NVIDIA Init debian package
ii nvidia-l4t-initrd 32.3.1-20191209230245 arm64 NVIDIA initrd debian package
ii nvidia-l4t-jetson-io 32.3.1-20191209230245 arm64 NVIDIA Jetson.IO debian package
ii nvidia-l4t-jetson-multimedia-api 32.3.1-20191209230245 arm64 NVIDIA Jetson Multimedia API is a collection of lower-level APIs that
ii nvidia-l4t-kernel 4.9.140-tegra-32.3.1- arm64 NVIDIA Kernel Package
ii nvidia-l4t-kernel-dtbs 4.9.140-tegra-32.3.1- arm64 NVIDIA Kernel DTB Package
ii nvidia-l4t-kernel-headers 4.9.140-tegra-32.3.1- arm64 NVIDIA Linux Tegra Kernel Headers Package
ii nvidia-l4t-multimedia 32.3.1-20191209230245 arm64 NVIDIA Multimedia Package
ii nvidia-l4t-multimedia-utils 32.3.1-20191209230245 arm64 NVIDIA Multimedia Package
ii nvidia-l4t-oem-config 32.3.1-20191209230245 arm64 NVIDIA OEM-Config Package
ii nvidia-l4t-tools 32.3.1-20191209230245 arm64 NVIDIA Public Test Tools Package
ii nvidia-l4t-wayland 32.3.1-20191209230245 arm64 NVIDIA Wayland Package
ii nvidia-l4t-weston 32.3.1-20191209230245 arm64 NVIDIA Weston Package
ii nvidia-l4t-x11 32.3.1-20191209230245 arm64 NVIDIA X11 Package
ii nvidia-l4t-xusb-firmware 32.3.1-20191209230245 arm64 NVIDIA USB Firmware Package
un nvidia-libopencl1-dev <none> <none> (no description available)
un nvidia-prime <none> <none> (no description available)
if you get this error on a Jetson board:
could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
Then it means you've installed nvidia-container-toolkit from the official repos (https://nvidia.github.io/nvidia-docker
).
nvidia-container-toolkit does not support Jetson right now, but there is a beta version in the jetpack repos that does.
Remove the nvidia-docker repo, then reinstall nvidia-container-runtime and nvidia-jetpack.
if you get this error on a Jetson board:
could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
Then it means you've installed nvidia-container-toolkit from the official repos (
https://nvidia.github.io/nvidia-docker
). nvidia-container-toolkit does not support Jetson right now, but there is a beta version in the jetpack repos that does. Remove the nvidia-docker repo, then reinstall nvidia-container-runtime and nvidia-jetpack.
Thank you!!
CUDA Version: 11.0 docker-ce: 19.03.7 nvidia-docker2-2.5.0-1
@mildsunrise It seems nvidia-docker supports jetson but I am still getting this error even with nvidia-docker2-2.5.0-1
what makes you think nvidia-docker supports jetson? the FAQ still says you need the SDK manager (aka the jetson repos). you need the version of nvidia-docker2 that comes with the jetson repos, not the nvidia-docker one
@mildsunrise ah you mean "jetpack" by "jetson repos" don't you? So that might very well be my issue. I am using the stock kernel ConnectTech provides and presumed because it had L4T 32.4.4
installed that it had Jetpack 4.2.2
installed, but I think I need to reflash it because the manufacturer probably does just a minimal install for QA purposes.
https://github.com/NVIDIA/nvidia-docker/wiki/NVIDIA-Container-Runtime-on-Jetson
I am getting the same error with nvidia-container-toolkit/bionic,now 1.5.1-1 amd64
under Ubuntu Server 20.04 LTS, running headless.
I installed the nvidia drivers via the .run
file, downloaded from the official nvidia page and nvidia-smi is working, as is the hashcat benchmark.
However, when I run docker run --rm --gpus all nvidia/cuda:11.1-base nvidia-smi
with docker-ce/focal,now 5:20.10.7~3-0~ubuntu-focal amd64
, I get the aforementioned error docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
I already tried newer drivers from the repositories up until version 470, but nothing worked.
Any ideas?
It seems that NVidia continues ignoring Linux support
@AlexAshs sorry for the delay in getting back to you. Would you mind creating a new ticket and including the debug output from /var/log/nvidia-container-toolkit.log
? This logging can be enabled by uncommenting the #debug=
line in the nvidia-container-cli
section of the /etc/nvidia-container-runtime/config.toml
file.
The reason I ask for a new issue is that this one has gotten quite long and seems to contain a mix of issues related to Jetson platforms and others that have been marked as fixed.
@elezar No worrys, I was just trying things out, since this is my first dedicated GPU, so nothing in production just yet :D I have posted my issue here Containers with gpus not starting up. I really don't post issues often, I prefer finding solutoins first, so if there is something missing or the title sucks, just let me know, so I can provide what is necessary to tackle this.
I have the exact same problem.
Configuration: Host: Windows 10 with WSL2, with CUDA installed.
Error: docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
Command: docker run --gpus all --cpus 2 --name test -it pytorch/pytorch
Hardware: NVidia GeForce GTX 1660 TI
Any solution to this program?
Are you installing updates from the windows insider dev channel? It seems to be a requirement for this setup to work.
@AlexAshs I am not signed for insiders program. May you tell me which update is necessary? I will install it manually
@TheMarshalMole I found this guide, that should make things easier for you: https://www.forecr.io/blogs/installation/nvidia-docker-installation-for-ubuntu-in-wsl-2
I installed my GPU driver from Software and update >> additional drivers and It solved my problem
My local environment is as follows:
a virtual machine( KVM + qemu + libvirtd ) running Arch Linux, accessing the host's RTX 3090 graphics card through PCI passthrough.
Programs in the virtual machine access the graphics card via Docker, resulting in the error mentioned in the title.
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: driver rpc error: failed to process request: unknown.
I just downloaded the Triton Inference Server repo and ran into same on Mac. Ventura 13.4.1 (c).
Ran (as per instructions):
docker run --gpus=1 --rm --net=host -v ${PWD}/model_repository:/models nvcr.io/nvidia/tritonserver:23.06-py3 tritonserver --model-repository=/models
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory: unknown.
I have config the docker 19.03.6 and nvidia-docker successfully.BUT ,when I test:
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi GET errors :
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "process_linux.go:430: container init caused \"process_linux.go:413: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\n\\"\"": unknown.
then, I check the nvidia-container-cli ,it seems no error sudo nvidia-container-cli -k -d /dev/tty info
-- WARNING, the following logs are for debugging purposes only --
I0226 06:26:25.224982 78809 nvc.c:281] initializing library context (version=1.0.2, build=ff40da533db929bf515aca59ba4c701a65a35e6b) I0226 06:26:25.225050 78809 nvc.c:255] using root / I0226 06:26:25.225061 78809 nvc.c:256] using ldcache /etc/ld.so.cache I0226 06:26:25.225071 78809 nvc.c:257] using unprivileged user 65534:65534 I0226 06:26:25.230611 78810 nvc.c:191] loading kernel module nvidia I0226 06:26:25.230931 78810 nvc.c:203] loading kernel module nvidia_uvm I0226 06:26:25.231053 78810 nvc.c:211] loading kernel module nvidia_modeset I0226 06:26:25.231436 78811 driver.c:133] starting driver service I0226 06:26:25.356687 78809 nvc_info.c:434] requesting driver information with '' I0226 06:26:25.356983 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/vdpau/libvdpau_nvidia.so.418.87.00 I0226 06:26:25.357280 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.418.87.00 I0226 06:26:25.357333 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.418.87.00 I0226 06:26:25.357441 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.418.87.00 I0226 06:26:25.357512 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.418.87.00 I0226 06:26:25.357559 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.418.87.00 I0226 06:26:25.357629 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.418.87.00 I0226 06:26:25.357711 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.418.87.00 I0226 06:26:25.357760 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.418.87.00 I0226 06:26:25.357806 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.418.87.00 I0226 06:26:25.357868 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.418.87.00 I0226 06:26:25.357928 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.418.87.00 I0226 06:26:25.358002 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.418.87.00 I0226 06:26:25.358053 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.418.87.00 I0226 06:26:25.358108 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.418.87.00 I0226 06:26:25.358179 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.418.87.00 I0226 06:26:25.358606 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.418.87.00 I0226 06:26:25.358847 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.418.87.00 I0226 06:26:25.358902 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.418.87.00 I0226 06:26:25.358951 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.418.87.00 I0226 06:26:25.359001 78809 nvc_info.c:148] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.418.87.00 W0226 06:26:25.359039 78809 nvc_info.c:303] missing compat32 library libnvidia-ml.so W0226 06:26:25.359047 78809 nvc_info.c:303] missing compat32 library libnvidia-cfg.so W0226 06:26:25.359056 78809 nvc_info.c:303] missing compat32 library libcuda.so W0226 06:26:25.359066 78809 nvc_info.c:303] missing compat32 library libnvidia-opencl.so W0226 06:26:25.359076 78809 nvc_info.c:303] missing compat32 library libnvidia-ptxjitcompiler.so W0226 06:26:25.359086 78809 nvc_info.c:303] missing compat32 library libnvidia-fatbinaryloader.so W0226 06:26:25.359097 78809 nvc_info.c:303] missing compat32 library libnvidia-compiler.so W0226 06:26:25.359107 78809 nvc_info.c:303] missing compat32 library libvdpau_nvidia.so W0226 06:26:25.359117 78809 nvc_info.c:303] missing compat32 library libnvidia-encode.so W0226 06:26:25.359128 78809 nvc_info.c:303] missing compat32 library libnvidia-opticalflow.so W0226 06:26:25.359138 78809 nvc_info.c:303] missing compat32 library libnvcuvid.so W0226 06:26:25.359149 78809 nvc_info.c:303] missing compat32 library libnvidia-eglcore.so W0226 06:26:25.359159 78809 nvc_info.c:303] missing compat32 library libnvidia-glcore.so W0226 06:26:25.359169 78809 nvc_info.c:303] missing compat32 library libnvidia-tls.so W0226 06:26:25.359177 78809 nvc_info.c:303] missing compat32 library libnvidia-glsi.so W0226 06:26:25.359186 78809 nvc_info.c:303] missing compat32 library libnvidia-fbc.so W0226 06:26:25.359194 78809 nvc_info.c:303] missing compat32 library libnvidia-ifr.so W0226 06:26:25.359203 78809 nvc_info.c:303] missing compat32 library libGLX_nvidia.so W0226 06:26:25.359212 78809 nvc_info.c:303] missing compat32 library libEGL_nvidia.so W0226 06:26:25.359220 78809 nvc_info.c:303] missing compat32 library libGLESv2_nvidia.so W0226 06:26:25.359253 78809 nvc_info.c:303] missing compat32 library libGLESv1_CM_nvidia.so I0226 06:26:25.359527 78809 nvc_info.c:229] selecting /usr/bin/nvidia-smi I0226 06:26:25.359560 78809 nvc_info.c:229] selecting /usr/bin/nvidia-debugdump I0226 06:26:25.359585 78809 nvc_info.c:229] selecting /usr/bin/nvidia-persistenced I0226 06:26:25.359608 78809 nvc_info.c:229] selecting /usr/bin/nvidia-cuda-mps-control I0226 06:26:25.359632 78809 nvc_info.c:229] selecting /usr/bin/nvidia-cuda-mps-server I0226 06:26:25.359667 78809 nvc_info.c:366] listing device /dev/nvidiactl I0226 06:26:25.359676 78809 nvc_info.c:366] listing device /dev/nvidia-uvm I0226 06:26:25.359687 78809 nvc_info.c:366] listing device /dev/nvidia-uvm-tools I0226 06:26:25.359697 78809 nvc_info.c:366] listing device /dev/nvidia-modeset W0226 06:26:25.359731 78809 nvc_info.c:274] missing ipc /var/run/nvidia-persistenced/socket W0226 06:26:25.359753 78809 nvc_info.c:274] missing ipc /tmp/nvidia-mps I0226 06:26:25.359763 78809 nvc_info.c:490] requesting device information with '' I0226 06:26:25.366457 78809 nvc_info.c:520] listing device /dev/nvidia0 (GPU-03bb5927-ceaa-4166-ff1e-1d58a8cbf883 at 00000000:05:00.0) I0226 06:26:25.373129 78809 nvc_info.c:520] listing device /dev/nvidia1 (GPU-26602c4d-2069-84f3-3bc9-5d943fb3bdb4 at 00000000:06:00.0) I0226 06:26:25.380167 78809 nvc_info.c:520] listing device /dev/nvidia2 (GPU-0687efee-81a2-537e-d7fe-3a5694aceb29 at 00000000:85:00.0) I0226 06:26:25.387215 78809 nvc_info.c:520] listing device /dev/nvidia3 (GPU-4c95eb5b-8940-562c-742f-2078cb3a02eb at 00000000:86:00.0) NVRM version: 418.87.00 CUDA version: 10.1
Device Index: 0 Device Minor: 0 Model: Tesla K80 Brand: Tesla GPU UUID: GPU-03bb5927-ceaa-4166-ff1e-1d58a8cbf883 Bus Location: 00000000:05:00.0 Architecture: 3.7
Device Index: 1 Device Minor: 1 Model: Tesla K80 Brand: Tesla GPU UUID: GPU-26602c4d-2069-84f3-3bc9-5d943fb3bdb4 Bus Location: 00000000:06:00.0 Architecture: 3.7
Device Index: 2 Device Minor: 2 Model: Tesla K80 Brand: Tesla GPU UUID: GPU-0687efee-81a2-537e-d7fe-3a5694aceb29 Bus Location: 00000000:85:00.0 Architecture: 3.7
Device Index: 3 Device Minor: 3 Model: Tesla K80 Brand: Tesla GPU UUID: GPU-4c95eb5b-8940-562c-742f-2078cb3a02eb Bus Location: 00000000:86:00.0 Architecture: 3.7 I0226 06:26:25.387330 78809 nvc.c:318] shutting down library context I0226 06:26:25.388428 78811 driver.c:192] terminating driver service I0226 06:26:25.440777 78809 driver.c:233] driver service terminated successfully
is the nvidia-driver-version too low? in fact,the 418.87.00 is the nvidia official network recommend, and how to update the driver by apt instead of mannually with the driver-run file? I do not konw how to make it works. can anyone help me?