Closed tytcc closed 4 years ago
hi @tytcc - what NVIDIA driver version are you running on your Linux system? You should have at least r410 to run CUDA 10.0 containers and r418 to run CUDA 10.1 containers.
Please provide the output of nvidia-smi
Thanks for your answering. @dualvtable Now I know my NVIDIA driver version is too old.
@tytcc I also faced the same problem on ubunut16.04 machine. I have the latest driver 440.64.00 installed and now i tried to run example
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
i get this error
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\\\n\\\"\"": unknown. ERRO[0001] error waiting for container: context canceled
Getting the same output after installing nvidia-containers-toolkit.
***@pop-os:~$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: driver error: failed to process request\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled
Tried the steps mentioned in #1114 but still no luck.
nvidia-smi output:
NVIDIA-SMI 440.64 Driver Version: 440.64 CUDA Version: 10.2
0 Quadro M2000M Off
OS details:
***@pop-os:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Pop!_OS 18.04 LTS
Release: 18.04
Codename: bionic
I am seeing the same with driver version 440.82
:
# docker run \
--rm \
--runtime=nvidia \
-e NVIDIA_VISIBLE_DEVICES=all \
-e NVIDIA_DRIVER_CAPABILITIES=all \
nvidia/cuda nvidia-smi
/run/torcx/bin/docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli.real: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
# uname -a
Linux core1 4.19.106-coreos NVIDIA/nvidia-docker#1 SMP Wed Feb 26 21:43:18 -00 2020 x86_64 Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz GenuineIntel GNU/Linux
# dockerd --version
Docker version 18.06.3-ce, build d7080c1
# /opt/drivers/nvidia/bin/nvidia-smi
Fri Apr 17 12:54:34 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 GeForce RTX 2060 Off | 00000000:02:00.0 Off | N/A |
| 0% 45C P8 9W / 160W | 0MiB / 5932MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Getting the same error:
keo7@home-desktop:~$ uname -a
Linux home-desktop 4.19.0-8-amd64 NVIDIA/nvidia-docker#1 SMP Debian 4.19.98-1+deb10u1 (2020-04-27) x86_64 GNU/Linux
keo7@home-desktop:~$ dockerd --version
Docker version 19.03.8, build afacb8b7f0
keo7@home-desktop:~$ nvidia-smi
Fri May 8 22:39:56 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82 Driver Version: 440.82 CUDA Version: N/A |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 TITAN V On | 00000000:26:00.0 On | N/A |
| 28% 43C P2 38W / 250W | 678MiB / 12066MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
@KeironO I wouldn't bother using the nvidia runtime in my opinion, it's disruptive to the setup of your distribution's runc (or whatever OCI runtime you have), clearly it has some issues and all it does is wrap runc with some helpers controlled by environment variables (at least from what I can tell).
If you can find out what your application needs you should be able to expose the devices and libraries from the host manually without having to have an extra binary to manage.
If there are other benefits I'd be interested to know as I have my GPU accelerated workloads running without needing to change my host's Docker setup.
I am also getting the same error with the same setup as @KeironO.
Hi there!
nvidia-container-cli.real: initialization error: driver error: failed to process request\\n\\"\"": unknown.
@billwhiteley @KeironO Most of the time this issue is linked to an incorrect driver installation or incorrect driver loading . We can usually figure out which one it is when the issue template is filled :)
Unfortunately being able to run nvidia-smi doesn't mean that your driver is fully loaded and you'll see issues later down the line (such as when running CUDA code or tensorflow).
I wouldn't bother using the nvidia runtime in my opinion, it's disruptive to the setup of your distribution's runc (or whatever OCI runtime you have), clearly it has some issues and all it does is wrap runc with some helpers controlled by environment variables (at least from what I can tell). If you can find out what your application needs you should be able to expose the devices and libraries from the host manually without having to have an extra binary to manage.
The NVIDIA runtime is only expected to be installed in a Kubernetes environment. For a docker only the nvidia-container-toolkit
is required (see the README).
As for implementing what the NVIDIA Container Toolkit does, you can certainly do that, however this would this probably have a high upfront cost for you to understand the details of the NVIDIA driver and userland architecture, and I'm not sure you want to be maintaining such a piece of software :) You would also be missing on new driver features as they come out, and if the CUDA or NVIDIA driver model changes you'd have to rewrite that software. Without bringing up enterprise support or general support, if your use case is narrow enough and you don't mind paying that maintenance cost that's definitely an option :)
For a Kubernetes environment the NVIDIA runtime provides even less benefit, all you need are the NVIDIA drivers/libraries on the host and this DaemonSet and then GPUs can be requested in the normal Kubernetes way:
resources:
limits:
nvidia.com/gpu: 1
Relevant Kubernetes documentation is here.
If your libraries aren't in the default location (/home/kubernetes/bin/nvidia
for some reason) you can specify the location manually using the -host-path
flag. You may need to add an NVIDIA entry to your container's /etc/ld.so.conf.d
and run ldconfig
so that the libraries can be found by your application.
Here's the full usage:
Usage of /usr/bin/nvidia-gpu-device-plugin:
-alsologtostderr
log to standard error as well as files
-container-path string
Path on the container that mounts '-host-path' (default "/usr/local/nvidia")
-container-vulkan-icd-path string
Path on the container that mounts '-host-vulkan-icd-path' (default "/etc/vulkan/icd.d")
-host-path string
Path on the host that contains nvidia libraries. This will be mounted inside the container as '-container-path' (default "/home/kubernetes/bin/nvidia")
-host-vulkan-icd-path string
Path on the host that contains the Nvidia Vulkan installable client driver. This will be mounted inside the container as '-container-vulkan-icd-path' (default "/home/kubernetes/bin/nvidia/vulkan/icd.d")
-log_backtrace_at value
when logging hits line file:N, emit a stack trace
-log_dir string
If non-empty, write log files in this directory
-logtostderr
log to standard error instead of files
-plugin-directory string
The directory path to create plugin socket (default "/device-plugin")
-stderrthreshold value
logs at or above this threshold go to stderr
-v value
log level for V logs
-vmodule value
comma-separated list of pattern=N settings for file-filtered logging
im having the same issuse.
0511 15:53:14.054294 27377 nvc.c:281] initializing library context (version=1.0.7, build=b71f87c04b8eca8a16bf60995506c35c937347d9)
I0511 15:53:14.054490 27377 nvc.c:255] using root /
I0511 15:53:14.054525 27377 nvc.c:256] using ldcache /etc/ld.so.cache
I0511 15:53:14.054595 27377 nvc.c:257] using unprivileged user 1000:1000
W0511 15:53:14.056714 27378 nvc.c:186] failed to set inheritable capabilities
W0511 15:53:14.056939 27378 nvc.c:187] skipping kernel modules load due to failure
I0511 15:53:14.058134 27379 driver.c:133] starting driver service
I0511 15:53:14.107994 27377 nvc_info.c:438] requesting driver information with ''
I0511 15:53:14.109434 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.440.33.01
I0511 15:53:14.109515 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.440.33.01
I0511 15:53:14.109800 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.440.33.01 over /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.440.33.01
I0511 15:53:14.110348 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.33.01
I0511 15:53:14.111277 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.440.33.01
I0511 15:53:14.112608 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.440.33.01
I0511 15:53:14.114313 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.440.33.01
I0511 15:53:14.114387 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.440.33.01
I0511 15:53:14.115208 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.440.33.01
I0511 15:53:14.115956 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.440.33.01
I0511 15:53:14.116012 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.440.33.01
I0511 15:53:14.116075 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.440.33.01
I0511 15:53:14.116886 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.440.33.01
I0511 15:53:14.117984 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.440.33.01
I0511 15:53:14.118698 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.440.33.01
I0511 15:53:14.118783 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.440.33.01
I0511 15:53:14.119561 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.440.33.01
I0511 15:53:14.119626 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.440.33.01
I0511 15:53:14.120347 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.440.33.01
I0511 15:53:14.121159 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.440.33.01
I0511 15:53:14.121611 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.440.33.01
I0511 15:53:14.121935 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.440.33.01
I0511 15:53:14.122775 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.440.33.01
I0511 15:53:14.123599 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.440.33.01
I0511 15:53:14.123773 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.440.33.01
I0511 15:53:14.125503 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.440.48.02
I0511 15:53:14.126273 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.440.48.02
I0511 15:53:14.127346 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-opencl.so.440.48.02
I0511 15:53:14.128794 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-ml.so.440.48.02
I0511 15:53:14.130323 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-fbc.so.440.48.02
I0511 15:53:14.132006 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-fatbinaryloader.so.440.48.02
I0511 15:53:14.133270 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-encode.so.440.48.02
I0511 15:53:14.135013 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-compiler.so.440.48.02
I0511 15:53:14.136295 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvcuvid.so.440.48.02
I0511 15:53:14.137890 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libcuda.so.440.48.02
W0511 15:53:14.138059 27377 nvc_info.c:303] missing library libvdpau_nvidia.so
W0511 15:53:14.138076 27377 nvc_info.c:307] missing compat32 library libnvidia-ml.so
W0511 15:53:14.138088 27377 nvc_info.c:307] missing compat32 library libnvidia-cfg.so
W0511 15:53:14.138098 27377 nvc_info.c:307] missing compat32 library libcuda.so
W0511 15:53:14.138108 27377 nvc_info.c:307] missing compat32 library libnvidia-opencl.so
W0511 15:53:14.138124 27377 nvc_info.c:307] missing compat32 library libnvidia-ptxjitcompiler.so
W0511 15:53:14.138148 27377 nvc_info.c:307] missing compat32 library libnvidia-fatbinaryloader.so
W0511 15:53:14.138166 27377 nvc_info.c:307] missing compat32 library libnvidia-compiler.so
W0511 15:53:14.138185 27377 nvc_info.c:307] missing compat32 library libvdpau_nvidia.so
W0511 15:53:14.138205 27377 nvc_info.c:307] missing compat32 library libnvidia-encode.so
W0511 15:53:14.138227 27377 nvc_info.c:307] missing compat32 library libnvidia-opticalflow.so
W0511 15:53:14.138250 27377 nvc_info.c:307] missing compat32 library libnvcuvid.so
W0511 15:53:14.138267 27377 nvc_info.c:307] missing compat32 library libnvidia-eglcore.so
W0511 15:53:14.138287 27377 nvc_info.c:307] missing compat32 library libnvidia-glcore.so
W0511 15:53:14.138308 27377 nvc_info.c:307] missing compat32 library libnvidia-tls.so
W0511 15:53:14.138328 27377 nvc_info.c:307] missing compat32 library libnvidia-glsi.so
W0511 15:53:14.138349 27377 nvc_info.c:307] missing compat32 library libnvidia-fbc.so
W0511 15:53:14.138367 27377 nvc_info.c:307] missing compat32 library libnvidia-ifr.so
W0511 15:53:14.138384 27377 nvc_info.c:307] missing compat32 library libnvidia-rtcore.so
W0511 15:53:14.138405 27377 nvc_info.c:307] missing compat32 library libnvoptix.so
W0511 15:53:14.138426 27377 nvc_info.c:307] missing compat32 library libGLX_nvidia.so
W0511 15:53:14.138444 27377 nvc_info.c:307] missing compat32 library libEGL_nvidia.so
W0511 15:53:14.138468 27377 nvc_info.c:307] missing compat32 library libGLESv2_nvidia.so
W0511 15:53:14.138491 27377 nvc_info.c:307] missing compat32 library libGLESv1_CM_nvidia.so
W0511 15:53:14.138511 27377 nvc_info.c:307] missing compat32 library libnvidia-glvkspirv.so
W0511 15:53:14.138531 27377 nvc_info.c:307] missing compat32 library libnvidia-cbl.so
I0511 15:53:14.140096 27377 nvc_info.c:233] selecting /usr/bin/nvidia-smi
I0511 15:53:14.140154 27377 nvc_info.c:233] selecting /usr/bin/nvidia-debugdump
I0511 15:53:14.140212 27377 nvc_info.c:233] selecting /usr/bin/nvidia-persistenced
I0511 15:53:14.140269 27377 nvc_info.c:233] selecting /usr/bin/nvidia-cuda-mps-control
I0511 15:53:14.140324 27377 nvc_info.c:233] selecting /usr/bin/nvidia-cuda-mps-server
I0511 15:53:14.140395 27377 nvc_info.c:370] listing device /dev/nvidiactl
I0511 15:53:14.140415 27377 nvc_info.c:370] listing device /dev/nvidia-uvm
I0511 15:53:14.140432 27377 nvc_info.c:370] listing device /dev/nvidia-uvm-tools
I0511 15:53:14.140449 27377 nvc_info.c:370] listing device /dev/nvidia-modeset
I0511 15:53:14.140520 27377 nvc_info.c:274] listing ipc /run/nvidia-persistenced/socket
W0511 15:53:14.140573 27377 nvc_info.c:278] missing ipc /tmp/nvidia-mps
I0511 15:53:14.140594 27377 nvc_info.c:494] requesting device information with ''
I0511 15:53:14.147767 27377 nvc_info.c:524] listing device /dev/nvidia0 (GPU-23fcb2ab-a6c2-b9e3-f455-6bf92a57b371 at 00000000:03:00.0)
NVRM version: 440.33.01
CUDA version: 10.2
Device Index: 0
Device Minor: 0
Model: GeForce 920MX
Brand: GeForce
GPU UUID: GPU-23fcb2ab-a6c2-b9e3-f455-6bf92a57b371
Bus Location: 00000000:03:00.0
Architecture: 5.0
I0511 15:53:14.147861 27377 nvc.c:318] shutting down library context
I0511 15:53:14.148492 27379 driver.c:192] terminating driver service
I0511 15:53:14.234076 27377 driver.c:233] driver service terminated successfully
kernel version
Linux hema 5.3.0-51-generic NVIDIA/nvidia-docker#44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
nvidia-smi -a
==============NVSMI LOG==============
Timestamp : Mon May 11 17:55:40 2020
Driver Version : 440.33.01
CUDA Version : 10.2
Attached GPUs : 1
GPU 00000000:03:00.0
Product Name : GeForce 920MX
Product Brand : GeForce
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-23fcb2ab-a6c2-b9e3-f455-6bf92a57b371
Minor Number : 0
VBIOS Version : 82.08.5A.00.0D
MultiGPU Board : No
Board ID : 0x300
GPU Part Number : N/A
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x03
Device : 0x00
Domain : 0x0000
Device Id : 0x134F10DE
Bus Id : 00000000:03:00.0
Sub System Id : 0x39F117AA
GPU Link Info
PCIe Generation
Max : 3
Current : 3
Link Width
Max : 4x
Current : 4x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 493000 KB/s
Rx Throughput : 3000 KB/s
Fan Speed : N/A
Performance State : P0
Clocks Throttle Reasons
Idle : Not Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : N/A
HW Power Brake Slowdown : N/A
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 2004 MiB
Used : 870 MiB
Free : 1134 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 3 MiB
Free : 253 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : N/A
Decoder : N/A
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Temperature
GPU Current Temp : 41 C
GPU Shutdown Temp : 99 C
GPU Slowdown Temp : 94 C
GPU Max Operating Temp : 98 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : N/A
Power Draw : N/A
Power Limit : N/A
Default Power Limit : N/A
Enforced Power Limit : N/A
Min Power Limit : N/A
Max Power Limit : N/A
Clocks
Graphics : 993 MHz
SM : 993 MHz
Memory : 900 MHz
Video : 973 MHz
Applications Clocks
Graphics : 967 MHz
Memory : 900 MHz
Default Applications Clocks
Graphics : 965 MHz
Memory : 900 MHz
Max Clocks
Graphics : 993 MHz
SM : 993 MHz
Memory : 900 MHz
Video : 973 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Processes
Process ID : 1347
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 34 MiB
Process ID : 1655
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 76 MiB
Process ID : 2765
Type : G
Name : /usr/lib/xorg/Xorg
Used GPU Memory : 184 MiB
Process ID : 2951
Type : G
Name : /usr/bin/gnome-shell
Used GPU Memory : 273 MiB
Process ID : 3830
Type : G
Name : /opt/google/chrome/chrome --type=gpu-process --field-trial-handle=17903442744480519122,5081937925041455948,131072 --gpu-preferences=MAAAAAAAAAAgAAAAAAAAAAAAAAAAAAAAAABgAAAAAAAQAAAAAAAAAAAAAAAAAAAACAAAAAAAAAA= --shared-files
Used GPU Memory : 292 MiB
docker version
Client: Docker Engine - Community
Version: 19.03.8
API version: 1.40
Go version: go1.12.17
Git commit: afacb8b7f0
Built: Wed Mar 11 01:25:46 2020
OS/Arch: linux/amd64
Experimental: false
Server: Docker Engine - Community
Engine:
Version: 19.03.8
API version: 1.40 (minimum version 1.12)
Go version: go1.12.17
Git commit: afacb8b7f0
Built: Wed Mar 11 01:24:19 2020
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.2.13
GitCommit: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc:
Version: 1.0.0-rc10
GitCommit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
docker-init:
Version: 0.18.0
GitCommit: fec3683
nvidia packages
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-==============================================-============================-============================-=================================================================================================
un libgldispatch0-nvidia <none> <none> (no description available)
ii libnvidia-cfg1-440:amd64 440.33.01-0ubuntu1 amd64 NVIDIA binary OpenGL/GLX configuration library
un libnvidia-cfg1-any <none> <none> (no description available)
un libnvidia-common <none> <none> (no description available)
ii libnvidia-common-440 440.82-0ubuntu0~0.18.04.1 all Shared files used by the NVIDIA libraries
rc libnvidia-compute-435:amd64 435.21-0ubuntu0.18.04.2 amd64 NVIDIA libcompute package
ii libnvidia-compute-440:amd64 440.33.01-0ubuntu1 amd64 NVIDIA libcompute package
ii libnvidia-container-tools 1.0.7-1 amd64 NVIDIA container runtime library (command-line tools)
ii libnvidia-container1:amd64 1.0.7-1 amd64 NVIDIA container runtime library
un libnvidia-decode <none> <none> (no description available)
ii libnvidia-decode-440:amd64 440.33.01-0ubuntu1 amd64 NVIDIA Video Decoding runtime libraries
un libnvidia-encode <none> <none> (no description available)
ii libnvidia-encode-440:amd64 440.33.01-0ubuntu1 amd64 NVENC Video Encoding runtime library
un libnvidia-fbc1 <none> <none> (no description available)
ii libnvidia-fbc1-440:amd64 440.33.01-0ubuntu1 amd64 NVIDIA OpenGL-based Framebuffer Capture runtime library
un libnvidia-gl <none> <none> (no description available)
ii libnvidia-gl-440:amd64 440.33.01-0ubuntu1 amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un libnvidia-ifr1 <none> <none> (no description available)
ii libnvidia-ifr1-440:amd64 440.33.01-0ubuntu1 amd64 NVIDIA OpenGL-based Inband Frame Readback runtime library
un libnvidia-ml1 <none> <none> (no description available)
un nvidia-304 <none> <none> (no description available)
un nvidia-340 <none> <none> (no description available)
un nvidia-384 <none> <none> (no description available)
un nvidia-390 <none> <none> (no description available)
un nvidia-common <none> <none> (no description available)
rc nvidia-compute-utils-435 435.21-0ubuntu0.18.04.2 amd64 NVIDIA compute utilities
ii nvidia-compute-utils-440 440.33.01-0ubuntu1 amd64 NVIDIA compute utilities
ii nvidia-container-runtime 3.1.4-1 amd64 NVIDIA container runtime
un nvidia-container-runtime-hook <none> <none> (no description available)
ii nvidia-container-toolkit 1.0.5-1 amd64 NVIDIA container runtime hook
rc nvidia-dkms-435 435.21-0ubuntu0.18.04.2 amd64 NVIDIA DKMS package
ii nvidia-dkms-440 440.33.01-0ubuntu1 amd64 NVIDIA DKMS package
un nvidia-dkms-kernel <none> <none> (no description available)
un nvidia-docker <none> <none> (no description available)
rc nvidia-docker2 2.2.2-1 all nvidia-docker CLI wrapper
ii nvidia-driver-440 440.33.01-0ubuntu1 amd64 NVIDIA driver metapackage
un nvidia-driver-binary <none> <none> (no description available)
un nvidia-kernel-common <none> <none> (no description available)
rc nvidia-kernel-common-435 435.21-0ubuntu0.18.04.2 amd64 Shared files used with the kernel module
ii nvidia-kernel-common-440 440.33.01-0ubuntu1 amd64 Shared files used with the kernel module
un nvidia-kernel-source <none> <none> (no description available)
un nvidia-kernel-source-435 <none> <none> (no description available)
ii nvidia-kernel-source-440 440.33.01-0ubuntu1 amd64 NVIDIA kernel source package
un nvidia-legacy-304xx-vdpau-driver <none> <none> (no description available)
un nvidia-legacy-340xx-vdpau-driver <none> <none> (no description available)
un nvidia-libopencl1-dev <none> <none> (no description available)
ii nvidia-modprobe 440.33.01-0ubuntu1 amd64 Load the NVIDIA kernel driver and create device files
un nvidia-opencl-icd <none> <none> (no description available)
un nvidia-persistenced <none> <none> (no description available)
ii nvidia-prime 0.8.8.2 all Tools to enable NVIDIA's Prime
ii nvidia-settings 440.64-0ubuntu0~0.18.04.1 amd64 Tool for configuring the NVIDIA graphics driver
un nvidia-settings-binary <none> <none> (no description available)
un nvidia-smi <none> <none> (no description available)
un nvidia-utils <none> <none> (no description available)
ii nvidia-utils-440 440.33.01-0ubuntu1 amd64 NVIDIA driver support binaries
un nvidia-vdpau-driver <none> <none> (no description available)
ii xserver-xorg-video-nvidia-440 440.33.01-0ubuntu1 amd64 NVIDIA binary Xorg driver
Meet the same problem, any solutions?
@elliothe i have uninstalled CUDA, Nvidia Drivers, nvidia docker and docker. Then installed everything again from scratch. This solved the problem for me
@HemaZ Thanks for the solution. I may do the same if I have no alternative ways.
Got same problem, fixed by run sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit
OS/docker info:
$ dockerd --version
Docker version 19.03.8, build afacb8b7f0
$ uname -a
Linux x 5.3.0-53-generic NVIDIA/nvidia-docker#47~18.04.1-Ubuntu SMP Thu May 7 13:10:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
@albin3 didn't fix it for me .. I followed all instructions in https://developer.nvidia.com/blog/announcing-cuda-on-windows-subsystem-for-linux-2 yet still seeing:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mountpoint for devices not found\"": unknown.
Same issue here.
Trying to run this repository´s demo but I got the following error
$ docker-compose up
ERROR: for vehicle_counting Cannot start service vehicle_counting: OCI runtime create failed: container_linux.go:349: starting
container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 cause
\\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown
ERROR: for vehicle_counting Cannot start service vehicle_counting: OCI runtime create failed: container_linux.go:349: starting
container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused
\\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process
request\\\\n\\\"\"": unknown
Tried to instlal nvidia-toolkit as suggested in here but still not working.
Here's my $ docker info
output
Client:
Debug Mode: false
Server:
Containers: 3
Running: 0
Paused: 0
Stopped: 3
Images: 7
Server Version: 19.03.12
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: nvidia runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
init version: fec3683
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.4.0-42-generic
Operating System: Ubuntu 20.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 3.844GiB
Name: geo-vbox
ID: PLLH:2H5F:NGLW:52TT:2Q77:AUHV:S3PX:3THU:XIEA:NYMX:FEYD:E2AT
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No swap limit support
Any idea how to solve it?
Same issue. I working with docker version Docker version 19.03.12, build 48a66213fe inside wsl 2 emulation for win10
I also have the same problem, working with Docker version 19.03.12inside WSL 2 emulation for Windows 10. Kernal Version: 4.19.121-microsoft-standard.
Having same issue with AGX Xavier: https://github.com/NVIDIA/nvidia-container-toolkit/issues/183
Exact same issue here. Followed nvidia guide
Window 10 version 1909 build 18363.1049 Docker version 19.03.12 WSL2 Ubuntu 18.04 and 20.04 Kernal Version: 4.19.121-microsoft-standard Windows nvidia drivers 455.41 CUDA 11.1
The output of
nvidia-container-cli -k -d /dev/tty info
I0821 16:21:57.950311 5686 nvc.c:282] initializing library context (version=1.3.0, build=af0220ff5c503d9ac6a1b5a491918229edbb37a4)
I0821 16:21:57.950354 5686 nvc.c:256] using root /
I0821 16:21:57.950358 5686 nvc.c:257] using ldcache /etc/ld.so.cache
I0821 16:21:57.950376 5686 nvc.c:258] using unprivileged user 1000:1000
I0821 16:21:57.950389 5686 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0821 16:21:57.950454 5686 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W0821 16:21:57.950514 5686 nvc.c:172] failed to detect NVIDIA devices
W0821 16:21:57.950641 5687 nvc.c:187] failed to set inheritable capabilities
W0821 16:21:57.950680 5687 nvc.c:188] skipping kernel modules load due to failure
I0821 16:21:57.950836 5688 driver.c:101] starting driver service
E0821 16:21:57.950966 5688 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0821 16:21:57.951083 5686 driver.c:196] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request
Same here stuck
Same here stuck
The original issue described here, that has an error of:
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.1, please update your driver to a newer version, or use an earlier cuda container
Is due to the fact that the original poster had an NVIDIA driver that was too old to run CUDA 10.1.
The poster acknowledged this and closed the issue on March 21st. https://github.com/NVIDIA/nvidia-docker/issues/1225#issuecomment-601990042
Since that time, this issue has been reopened and commented on many times with unrelated error messages.
Since the original issue was resolved, I am going to close this issue again, and encourage you to open a new issue if you are still having problems with different errors.
https://ngc.nvidia.com/catalog/containers/nvidia:l4t-base
try using this base image. it solved all my problems to jetson tegra arm64 architecture issues and now I can seamlessly docker pull and use my docker images across jetson tegra devices
Anytime nvidia docker fails you will see an error that begins with:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: ...
This part of the error message is output by docker itself, and is out of our control.
It's the part after stderr:
that is relevant to nvidia-docker
.
In the original post, this error was:
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container
@gsss124 is this actually the same error response you were seeing? Given the description of your problem, it seems unlikely.
In any case, I would recommend performing your step 4 using docker's daemon.json
file instead of editing the docker service directly:
$ cat /etc/docker/daemon.json
{
"data-root": "/your/custom/location",
"default-runtime": "nvidia",
"runtimes": {
"nvidia": {
"path": "/usr/bin/nvidia-container-runtime",
"runtimeArgs": []
}
}
}
Anytime nvidia docker fails you will see an error that begins with:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: ...
This part of the error message is output by docker itself, and is out of our control.
It's the part after
stderr:
that is relevant tonvidia-docker
.In the original post, this error was:
nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container
@gsss124 is this actually the same error response you were seeing? Given the description of your problem, it seems unlikely.
Thanks for the reply. This was not the error, it was only related to OCI. Now docker info gives the custom data-root location, but to my surprise it is still using the system drive as I see a reduction in space available on the system drive and space available is same in my custom data-root drive. So, I will delete my reply above.
In any case, I would recommend performing your step 4 using docker's
daemon.json
file instead of editing the docker service directly:$ cat /etc/docker/daemon.json { "data-root": "/your/custom/location", "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
Thanks for this, but I tried this method and it did not work for me. But I will give it a shot again by adding system restart step. I even tried nvidia-container-runtime separately, that didn't work. After editing docker.service, it gave me data-root as a my custom location but still using the /var/lib/docker location to store data! I don't understand what is happening.
In any case, I would recommend performing your step 4 using docker's
daemon.json
file instead of editing the docker service directly:$ cat /etc/docker/daemon.json { "data-root": "/your/custom/location", "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
Thanks for this, but I tried this method and it did not work for me. But I will give it a shot again by adding system restart step. I even tried nvidia-container-runtime separately, that didn't work. After editing docker.service, it gave me data-root as a my custom location but still using the /var/lib/docker location to store data! I don't understand what is happening.
To my horror, it has created a new drive taking a part of space of system drive, named it to my custom data-root name and renamed my old drive! It's not using /var/lib/docker, but a part of it renamed to my custom data-root name.
sudo service docker start
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\\n\\"\"": unknown.
docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\\n\\"\"": unknown.
ldconfig -p | grep cuda libicudata.so.66 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicudata.so.66 libcuda.so.1 (libc6,x86-64) => /usr/lib/wsl/lib/libcuda.so.1
ls -al /usr/lib/wsl/lib total 70792 dr-xr-xr-x 1 root root 512 Sep 18 15:53 . drwxr-xr-x 4 root root 4096 Sep 18 12:28 .. -r--r--r-- 1 root root 124664 Aug 30 09:51 libcuda.so -r--r--r-- 2 root root 832936 Sep 12 08:44 libd3d12.so -r--r--r-- 2 root root 5073944 Sep 12 08:44 libd3d12core.so -r--r--r-- 2 root root 25069816 Sep 12 08:44 libdirectml.so -r--r--r-- 2 root root 878768 Sep 12 08:44 libdxcore.so -r--r--r-- 1 root root 40496936 Aug 30 09:51 libnvwgf2umx.so
sudo ln -s /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1 ln: failed to create symbolic link '/usr/lib/wsl/lib/libcuda.so.1': Read-only file system
seems as if its missing and the video driver is still required, unless there is something that can make it appear at location
ls: cannot access '/usr/lib/wsl/lib/libcuda.so.1'
any thoughts?
From my understanding putting the video driver is no longer required in docker -- ubuntu guest
Directory of C:\Windows\System32\lxss\lib
09/18/2020 03:53 PM
C:\Windows\System32\lxss\lib>mklink libcuda.so.1 libcuda.so symbolic link created for libcuda.so.1 <<===>> libcuda.so
still no work, but seems closer
more info docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark 18 sudo find / -iname /usr/lib/wsl/lib/libcuda.so.1 19 sudo find / -iname libcuda.so.1 20 ldconfig -p | grep cuda 21 ls /usr/lib/wsl/lib/libcuda.so.1 22 #sudo ls -d libcuda.so.1 23 cd / 24 sudo ls -d libcuda.so.1 25 ls -al /usr/lib/wsl 26 ls -al /usr/lib/wsl/drivers 27 ls -al /usr/lib/wsl/drivers | grep -i libcuda* 28 ls -al /usr/lib/wsl/ 29 ls -al /usr/lib/wsl/lib 30 sudo ln -s /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1 31 sudo ln -s /usr/lib/wsl/lib/libcuda.so.1 /usr/lib/wsl/lib/libcuda.so 32 sudo ln -s /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1 33 echo $LD_LIBRARY_PATH 34 sudo apt install nvidia-361-dev 35 nvidia-smi 36 sudo apt isntall nvidia-utils-435 37 sudo apt install nvidia-utils-435 38 cd %SYSTEMROOT%\System32\lxss\lib 39 cd %SYSTEMROOT%\ 40 cd %SYSTEMROOT% 41 ls 42 ls /usr/lib/wsl/lib/ 43 ls -al /usr/lib/wsl/lib/ 44 docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark 45 sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi 46 sudo apt-remove nvidia-docker2 47 sudo apt-get remove nvidia-docker2 48 sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi 49 docker run --rm --privileged nvidia/cuda nvidia-smi 50 nvidia-docker run --rm nvidia/cuda nvidia-smi 51 nvidia-docker run --rm --privileged nvidia/cuda nvidia-smi 52 docker run --rm --privileged nvidia/cuda nvidia-smi 53 nvidia-smi 54 sudo apt-get install nvidia-docker2 55 nvidia-docker run --rm --privileged nvidia/cuda nvidia-smi 56 docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark 57 docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark -compare 58 nvcc --version 59 sudo apt-get install nvidia-cuda-toolkit 60 docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark -compare
In any case, I would recommend performing your step 4 using docker's
daemon.json
file instead of editing the docker service directly:$ cat /etc/docker/daemon.json { "data-root": "/your/custom/location", "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }
I tried this again by editing /etc/docker/daemon.json and got the following stderr:
nvidia-container-cli: ldcache error: process /sbin/ldconfig.real failed with error code: 1\\\\n\\\"\""
Full output:
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: ldcache error: process /sbin/ldconfig.real failed with error code: 1\\\\n\\\"\"": unknown
docker info now displays the required custom directory and space is reduced in the right directory. Now it is ldcache error. I checked here but my seccomp output is YES:
>>cat /boot/config-$(uname -r) | grep -i seccomp
Output:
CONFIG_SECCOMP=y CONFIG_HAVE_ARCH_SECCOMP_FILTER=y CONFIG_SECCOMP_FILTER=y
Please suggest what might be the problem.
sudo service docker start
* Starting Docker: docker [ OK ]
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\n\""": unknown.
docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\n\""": unknown.
ldconfig -p | grep cuda libicudata.so.66 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicudata.so.66 libcuda.so.1 (libc6,x86-64) => /usr/lib/wsl/lib/libcuda.so.1
ls -al /usr/lib/wsl/lib total 70792 dr-xr-xr-x 1 root root 512 Sep 18 15:53 . drwxr-xr-x 4 root root 4096 Sep 18 12:28 .. -r--r--r-- 1 root root 124664 Aug 30 09:51 libcuda.so -r--r--r-- 2 root root 832936 Sep 12 08:44 libd3d12.so -r--r--r-- 2 root root 5073944 Sep 12 08:44 libd3d12core.so -r--r--r-- 2 root root 25069816 Sep 12 08:44 libdirectml.so -r--r--r-- 2 root root 878768 Sep 12 08:44 libdxcore.so -r--r--r-- 1 root root 40496936 Aug 30 09:51 libnvwgf2umx.so
sudo ln -s /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1 ln: failed to create symbolic link '/usr/lib/wsl/lib/libcuda.so.1': Read-only file system
seems as if its missing and the video driver is still required, unless there is something that can make it appear at location
ls: cannot access '/usr/lib/wsl/lib/libcuda.so.1'
any thoughts?
From my understanding putting the video driver is no longer required in docker -- ubuntu guest
Are you using a virtual machine? As stated by @klueska output after stderr is of interest. Your error says
stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\\n\""": unknown.
Something related to nvidia-driver not being available where required.
@wanfuse123 please file a new issue if you need help debugging this. Your issue looks unrelated to the one here (especially since it seems you are running on Windows, and not linux).
@tytcc I also faced the same problem on ubunut16.04 machine. I have the latest driver 440.64.00 installed and now i tried to run example
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
i get this errordocker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\\\n\\\"\"": unknown. ERRO[0001] error waiting for container: context canceled
I also face the problem, did you solve it?
nvidia-smi does not work under wsl2 as of right now. Use the following test instead
"medium.com" + "how-to-use-nvidia-gpu-in-docker-to-run-tensorflow"
use their
On Sat, Oct 17, 2020 at 10:42 PM chauncygu notifications@github.com wrote:
@tytcc https://github.com/tytcc I also faced the same problem on ubunut16.04 machine. I have the latest driver 440.64.00 installed and now i tried to run example docker run --gpus all nvidia/cuda:10.0-base nvidia-smi i get this error docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\n\\"\"": unknown. ERRO[0001] error waiting for container: context canceled
I also face the problem, did you solve it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/nvidia-docker/issues/1225#issuecomment-711108195, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDBZWO35LJAFRGXOXF4SXLSLJIY3ANCNFSM4LQL2LDA .
sorry got cut off. Use their testing examples and that container. It costs 5 bucks for access but I thought it was worth it for one year access. ( NOTE I have nothing to do with their site. I just spent the five bucks for it)
anyway use their testing examples.
You can't use "nvidia-smi" it is not working right now in the containers. Apparently nvidia and microsoft are working hard on the problem
On Sat, Oct 17, 2020 at 11:00 PM Steven Anderson wanfuse123@gmail.com wrote:
nvidia-smi does not work under wsl2 as of right now. Use the following test instead
"medium.com" + "how-to-use-nvidia-gpu-in-docker-to-run-tensorflow"
use their
On Sat, Oct 17, 2020 at 10:42 PM chauncygu notifications@github.com wrote:
@tytcc https://github.com/tytcc I also faced the same problem on ubunut16.04 machine. I have the latest driver 440.64.00 installed and now i tried to run example docker run --gpus all nvidia/cuda:10.0-base nvidia-smi i get this error docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\n\\"\"": unknown. ERRO[0001] error waiting for container: context canceled
I also face the problem, did you solve it?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/nvidia-docker/issues/1225#issuecomment-711108195, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDBZWO35LJAFRGXOXF4SXLSLJIY3ANCNFSM4LQL2LDA .
update on the medium link, look at the comments I have made an updated script that runs a simple test.
@elliothe i have uninstalled CUDA, Nvidia Drivers, nvidia docker and docker. Then installed everything again from scratch. This solved the problem for me
I think it is right. It work for me when I uninstall Nvidia Driver(version 460): thanks.
iser@iser:~$ sudo apt-get purge nvidia
正在读取软件包列表... 完成
正在分析软件包的依赖关系树
正在读取状态信息... 完成
注意,根据Glob 'nvidia' 选中了 'nvidia-kernel-common-418-server'
注意,根据Glob 'nvidia' 选中了 'nvidia-325-updates'
注意,根据Glob 'nvidia' 选中了 'nvidia-346-updates'
注意,根据Glob 'nvidia' 选中了 'nvidia-driver-binary'
注意,根据Glob 'nvidia' 选中了 'nvidia-331-dev'
注意,根据Glob 'nvidia' 选中了 'nvidia-304-updates-dev'
注意,根据Glob 'nvidia' 选中了 'nvidia-compute-utils-418-server'
注意,根据Glob 'nvidia' 选中了 'nvidia-384-dev'
注意,根据Glob 'nvidia' 选中了 'nvidia-docker2'
注意,根据Glob 'nvidia' 选中了 'nvidia-libopencl1-346-updates'
注意,根据Glob 'nvidia' 选中了 'nvidia-driver-440-server'
注意,根据Glob 'nvidia*' 选中了 'nvidia-340-updates-uvm'
-------following is the note of installing successfully.----------
Adding group iser' (GID 1000) ... Done. Adding user
iser' ...
Adding new user iser' (1000) with group
iser' ...
Creating home directory /home/iser' ... Copying files from
/etc/skel' ...
[ OK ] Congratulations! You have successfully finished setting up Apollo Dev Environment.
[ OK ] To login into the newly created apollo_dev_iser container, please run the following command:
[ OK ] bash docker/scripts/dev_into.sh
[ OK ] Enjoy!
Why is this issue marked as closed? @luogyu7 says "it worked for me" because they uninstalled an ancient version and replaced it with an updated one? ..
My configuration: Ubuntu 20 WSL2 on Windows 10, Docker works properly, no issues starting other non-cuda containers.
I ran the command suggested earlier by @lougyu7
$ sudo apt-get purge nvidia*
Then attempted to reinstall the nvidia-cuda-toolkit and now we're here:
$ uname -a
Linux COMMODORE387 5.4.72-microsoft-standard-WSL2 NVIDIA/nvidia-docker#1 SMP Wed Oct 28 23:40:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ apt-get install nvidia-cuda-toolkit
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:
The following packages have unmet dependencies:
nvidia-cuda-toolkit : Depends: nvidia-cuda-dev (= 10.1.243-3) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
# docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark -compare
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
ERRO[0001] error waiting for container: context canceled
run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
Unable to find image 'nvidia/cuda:11.0-base' locally
Digest: sha256:774ca3d612de15213102c2dbbba55df44dc5cf9870ca2be6c6e9c627fa63d67a
Status: Downloaded newer image for nvidia/cuda:11.0-base
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
I can't help you with the unmet dependencies error for the nvidia-cuda-toolkit
component (that is something independent of the container stack, and (for all intents and purposes) unnecessary for you to install on your host if you only ever plan to run cuda applications in containers.
I think the package you were intending to install is nvidia-container-toolkit
-- which is the one required for container support.
Restart worked for me
I am getting this error when running any nvidia-docker command.
docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: nvml error: unknown error: unknown.
I read through the whole threads, but can only see people fix it by restarting the server or reinstall driver/docker.
I have tried restarting server but it does not resolve this error.
解決方法 ステップ1.まずあなたがダウンロードした JetPack SDKのバージョンを調べてください。 jetson-nano-developer-kitを使用している方→ https://developer.nvidia.com/embedded/jetson-nano-developer-kit ステップ2.上のURLページの真ん中らへんにある、「Download jetpack」ボタンをクリック ステップ3.新たなページが出てくるだろう。思い出してほしい。このページからあなたはJetPackをインストールしたはずだ。ここに書かれているJetPack のバージョンをメモしておこう。例)JetPack 4.5.1 ステップ4.Jetson AI Coursesに登録しよう。ここに動画付きの解説方法がある。youtubeとかにも同じ動画が出ているが、このコースのほうが備考欄に詳細が書かれていてわかりやすい。 ステップ5.Jetson AI Coursesの「Download Docker And Start JupyterLab」をあなたは見ているだろう。そしてタイトルにあるようなエラーが出たのでググった、そんなところだろう。 まずはこれをよく見てほしい。 「echo "sudo docker run --runtime nvidia -it --rm --network host \ --volume ~/nvdli-data:/nvdli-nano/data \ --device /dev/video0 \ nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.4.4" > docker_dli_run.sh」 この最後の行に「-r32.4.4」とある。ここをあなたがダウンロードしたjetpackに合わせる必要がある。 私の場合はJetPack 4.5.1をダウンロードしたので、「-r32.4.5」にすればよい。
ステップ6.あとはJetson AI Coursesの「Download Docker And Start JupyterLab」の解説動画どおりにすればよい。 注意)もしこれでもエラーが出るようならば、間違って先にダウンロードしたDocker のイメージが邪魔をしている可能性が非常に高い。この削除方法は「Docker image削除」とググってほしい。削除したらステップ5に戻って、もう一度トライだ。これでもダメなら、「sudo apt update」とか「sudo apt upgrade」とかやってみてから、邪魔しているDockerイメージの削除からやってみてほしい。もしこれでもダメなら、あの「echo"sudo docker ~」のあとに、ちゃんと「chmod +x docker_dli_run.sh」をしたか思い出してほしい。もしやっていないのなら、やろう。それでもダメだと、う〜ん、あの最後の行に「-r32.4.4」で実は-32ではないパターンか?
I managed to solve the problem for me (thank you thread for pointing out nvidia-container-cli
).
tl;dr: Check your libnvidia-container1
version.
Fortunately, I had two systems to check. The working system had libnvidia-container1
version 1.4.0
, the crashing system had an RC version of 1.5.0
!
1.5.0
(the final, non-RC version)Try running your CUDA container with --gpus all
, verify nvidia-smi
shows output.
If you have made any recent updates make sure to reboot and it might solve the issue for you.
Restart my system operation worked for me like @abaybektursun cite above.
Had the same issue like topickstarter on Ubuntu 20.04 LTS under wsl2 in Windows 10. Resolved it by updating Windows version from 21H1 to 21H2. They say CUDA doesn't work properly on wsl2 with 21H1 update.
The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.
Also, before reporting a new issue, please make sure that:
1. Issue or feature description
previous steps are same with the tutorial. after installing nvidia-container-toolkit
sudo apt-get install -y nvidia-container-toolkit
when I used the test examples, it always got error.docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
error:_docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"processlinux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container\\n\\"\"": unknown. ERRO[0018] error waiting for container: context canceled
2. Steps to reproduce the issue
just when I run the test examples:
docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
error message
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container\\n\\"\"": unknown. ERRO[0018] error waiting for container: context canceled
I also tried
docker run --gpus 1 nvidia/cuda nvidia-smi
the error is similardocker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.1, please update your driver to a newer version, or use an earlier cuda container\\n\\"\"": unknown. ERRO[0124] error waiting for container: context canceled
3. Information to attach (optional if deemed irrelevant)
nvidia-container-cli -k -d /dev/tty info
uname -a
dmesg
nvidia-smi -a
docker version
dpkg -l '*nvidia*'
orrpm -qa '*nvidia*'
nvidia-container-cli -V