NVIDIA / nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs
Apache License 2.0
17.26k stars 2.03k forks source link

Can not use nvidia-docker. docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: ... #1225

Closed tytcc closed 4 years ago

tytcc commented 4 years ago

The template below is mostly useful for bug reports and support questions. Feel free to remove anything which doesn't apply to you and add more information where it makes sense.

Also, before reporting a new issue, please make sure that:


1. Issue or feature description

previous steps are same with the tutorial. after installing nvidia-container-toolkit sudo apt-get install -y nvidia-container-toolkit when I used the test examples, it always got error. docker run --gpus all nvidia/cuda:10.0-base nvidia-smi error:

_docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"processlinux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container\\n\\"\"": unknown. ERRO[0018] error waiting for container: context canceled

2. Steps to reproduce the issue

just when I run the test examples: docker run --gpus all nvidia/cuda:10.0-base nvidia-smi

error message

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container\\n\\"\"": unknown. ERRO[0018] error waiting for container: context canceled

I also tried docker run --gpus 1 nvidia/cuda nvidia-smi the error is similar

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.1, please update your driver to a newer version, or use an earlier cuda container\\n\\"\"": unknown. ERRO[0124] error waiting for container: context canceled

3. Information to attach (optional if deemed irrelevant)

dualvtable commented 4 years ago

hi @tytcc - what NVIDIA driver version are you running on your Linux system? You should have at least r410 to run CUDA 10.0 containers and r418 to run CUDA 10.1 containers.

Please provide the output of nvidia-smi

tytcc commented 4 years ago

Thanks for your answering. @dualvtable Now I know my NVIDIA driver version is too old.

murthy95 commented 4 years ago

@tytcc I also faced the same problem on ubunut16.04 machine. I have the latest driver 440.64.00 installed and now i tried to run example docker run --gpus all nvidia/cuda:10.0-base nvidia-smi i get this error docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\\\n\\\"\"": unknown. ERRO[0001] error waiting for container: context canceled

DhruvKoolRajamani commented 4 years ago

Getting the same output after installing nvidia-containers-toolkit.

***@pop-os:~$ docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: driver error: failed to process request\\\\n\\\"\"": unknown.
ERRO[0000] error waiting for container: context canceled 

Tried the steps mentioned in #1114 but still no luck.

nvidia-smi output:

NVIDIA-SMI 440.64       Driver Version: 440.64       CUDA Version: 10.2
0  Quadro M2000M       Off 

OS details:

***@pop-os:~$ lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Pop!_OS 18.04 LTS
Release:    18.04
Codename:   bionic
dippynark commented 4 years ago

I am seeing the same with driver version 440.82:

# docker run \
    --rm \
    --runtime=nvidia \
    -e NVIDIA_VISIBLE_DEVICES=all \
    -e NVIDIA_DRIVER_CAPABILITIES=all \
    nvidia/cuda nvidia-smi
/run/torcx/bin/docker: Error response from daemon: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:402: container init caused \"process_linux.go:385: running prestart hook 1 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli.real: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown.
# uname -a
Linux core1 4.19.106-coreos NVIDIA/nvidia-docker#1 SMP Wed Feb 26 21:43:18 -00 2020 x86_64 Intel(R) Core(TM) i5-6600K CPU @ 3.50GHz GenuineIntel GNU/Linux
# dockerd --version
Docker version 18.06.3-ce, build d7080c1
# /opt/drivers/nvidia/bin/nvidia-smi
Fri Apr 17 12:54:34 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: 10.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce RTX 2060    Off  | 00000000:02:00.0 Off |                  N/A |
|  0%   45C    P8     9W / 160W |      0MiB /  5932MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
KeironO commented 4 years ago

Getting the same error:

keo7@home-desktop:~$ uname -a
Linux home-desktop 4.19.0-8-amd64 NVIDIA/nvidia-docker#1 SMP Debian 4.19.98-1+deb10u1 (2020-04-27) x86_64 GNU/Linux
keo7@home-desktop:~$ dockerd --version
Docker version 19.03.8, build afacb8b7f0
keo7@home-desktop:~$ nvidia-smi
Fri May  8 22:39:56 2020       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.82       Driver Version: 440.82       CUDA Version: N/A      |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  TITAN V             On   | 00000000:26:00.0  On |                  N/A |
| 28%   43C    P2    38W / 250W |    678MiB / 12066MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
dippynark commented 4 years ago

@KeironO I wouldn't bother using the nvidia runtime in my opinion, it's disruptive to the setup of your distribution's runc (or whatever OCI runtime you have), clearly it has some issues and all it does is wrap runc with some helpers controlled by environment variables (at least from what I can tell).

If you can find out what your application needs you should be able to expose the devices and libraries from the host manually without having to have an extra binary to manage.

If there are other benefits I'd be interested to know as I have my GPU accelerated workloads running without needing to change my host's Docker setup.

billwhiteley commented 4 years ago

I am also getting the same error with the same setup as @KeironO.

RenaudWasTaken commented 4 years ago

Hi there!

nvidia-container-cli.real: initialization error: driver error: failed to process request\\n\\"\"": unknown.

@billwhiteley @KeironO Most of the time this issue is linked to an incorrect driver installation or incorrect driver loading . We can usually figure out which one it is when the issue template is filled :)

Unfortunately being able to run nvidia-smi doesn't mean that your driver is fully loaded and you'll see issues later down the line (such as when running CUDA code or tensorflow).

I wouldn't bother using the nvidia runtime in my opinion, it's disruptive to the setup of your distribution's runc (or whatever OCI runtime you have), clearly it has some issues and all it does is wrap runc with some helpers controlled by environment variables (at least from what I can tell). If you can find out what your application needs you should be able to expose the devices and libraries from the host manually without having to have an extra binary to manage.

The NVIDIA runtime is only expected to be installed in a Kubernetes environment. For a docker only the nvidia-container-toolkit is required (see the README).

As for implementing what the NVIDIA Container Toolkit does, you can certainly do that, however this would this probably have a high upfront cost for you to understand the details of the NVIDIA driver and userland architecture, and I'm not sure you want to be maintaining such a piece of software :) You would also be missing on new driver features as they come out, and if the CUDA or NVIDIA driver model changes you'd have to rewrite that software. Without bringing up enterprise support or general support, if your use case is narrow enough and you don't mind paying that maintenance cost that's definitely an option :)

dippynark commented 4 years ago

For a Kubernetes environment the NVIDIA runtime provides even less benefit, all you need are the NVIDIA drivers/libraries on the host and this DaemonSet and then GPUs can be requested in the normal Kubernetes way:

resources:
  limits:
    nvidia.com/gpu: 1

Relevant Kubernetes documentation is here.

If your libraries aren't in the default location (/home/kubernetes/bin/nvidia for some reason) you can specify the location manually using the -host-path flag. You may need to add an NVIDIA entry to your container's /etc/ld.so.conf.d and run ldconfig so that the libraries can be found by your application.

Here's the full usage:

Usage of /usr/bin/nvidia-gpu-device-plugin:
  -alsologtostderr
        log to standard error as well as files
  -container-path string
        Path on the container that mounts '-host-path' (default "/usr/local/nvidia")
  -container-vulkan-icd-path string
        Path on the container that mounts '-host-vulkan-icd-path' (default "/etc/vulkan/icd.d")
  -host-path string
        Path on the host that contains nvidia libraries. This will be mounted inside the container as '-container-path' (default "/home/kubernetes/bin/nvidia")
  -host-vulkan-icd-path string
        Path on the host that contains the Nvidia Vulkan installable client driver. This will be mounted inside the container as '-container-vulkan-icd-path' (default "/home/kubernetes/bin/nvidia/vulkan/icd.d")
  -log_backtrace_at value
        when logging hits line file:N, emit a stack trace
  -log_dir string
        If non-empty, write log files in this directory
  -logtostderr
        log to standard error instead of files
  -plugin-directory string
        The directory path to create plugin socket (default "/device-plugin")
  -stderrthreshold value
        logs at or above this threshold go to stderr
  -v value
        log level for V logs
  -vmodule value
        comma-separated list of pattern=N settings for file-filtered logging
HemaZ commented 4 years ago

im having the same issuse.

0511 15:53:14.054294 27377 nvc.c:281] initializing library context (version=1.0.7, build=b71f87c04b8eca8a16bf60995506c35c937347d9)
I0511 15:53:14.054490 27377 nvc.c:255] using root /
I0511 15:53:14.054525 27377 nvc.c:256] using ldcache /etc/ld.so.cache
I0511 15:53:14.054595 27377 nvc.c:257] using unprivileged user 1000:1000
W0511 15:53:14.056714 27378 nvc.c:186] failed to set inheritable capabilities
W0511 15:53:14.056939 27378 nvc.c:187] skipping kernel modules load due to failure
I0511 15:53:14.058134 27379 driver.c:133] starting driver service
I0511 15:53:14.107994 27377 nvc_info.c:438] requesting driver information with ''
I0511 15:53:14.109434 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvoptix.so.440.33.01
I0511 15:53:14.109515 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.440.33.01
I0511 15:53:14.109800 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-tls.so.440.33.01 over /usr/lib/x86_64-linux-gnu/tls/libnvidia-tls.so.440.33.01
I0511 15:53:14.110348 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-rtcore.so.440.33.01
I0511 15:53:14.111277 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.440.33.01
I0511 15:53:14.112608 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opticalflow.so.440.33.01
I0511 15:53:14.114313 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-opencl.so.440.33.01
I0511 15:53:14.114387 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ml.so.440.33.01
I0511 15:53:14.115208 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-ifr.so.440.33.01
I0511 15:53:14.115956 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.440.33.01
I0511 15:53:14.116012 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glsi.so.440.33.01
I0511 15:53:14.116075 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-glcore.so.440.33.01
I0511 15:53:14.116886 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fbc.so.440.33.01
I0511 15:53:14.117984 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.440.33.01
I0511 15:53:14.118698 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-encode.so.440.33.01
I0511 15:53:14.118783 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-eglcore.so.440.33.01
I0511 15:53:14.119561 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-compiler.so.440.33.01
I0511 15:53:14.119626 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cfg.so.440.33.01
I0511 15:53:14.120347 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvidia-cbl.so.440.33.01
I0511 15:53:14.121159 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libnvcuvid.so.440.33.01
I0511 15:53:14.121611 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libcuda.so.440.33.01
I0511 15:53:14.121935 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libGLX_nvidia.so.440.33.01
I0511 15:53:14.122775 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libGLESv2_nvidia.so.440.33.01
I0511 15:53:14.123599 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.440.33.01
I0511 15:53:14.123773 27377 nvc_info.c:152] selecting /usr/lib/x86_64-linux-gnu/libEGL_nvidia.so.440.33.01
I0511 15:53:14.125503 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-ptxjitcompiler.so.440.48.02
I0511 15:53:14.126273 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-opticalflow.so.440.48.02
I0511 15:53:14.127346 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-opencl.so.440.48.02
I0511 15:53:14.128794 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-ml.so.440.48.02
I0511 15:53:14.130323 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-fbc.so.440.48.02
I0511 15:53:14.132006 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-fatbinaryloader.so.440.48.02
I0511 15:53:14.133270 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-encode.so.440.48.02
I0511 15:53:14.135013 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvidia-compiler.so.440.48.02
I0511 15:53:14.136295 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libnvcuvid.so.440.48.02
I0511 15:53:14.137890 27377 nvc_info.c:154] skipping /usr/lib/i386-linux-gnu/libcuda.so.440.48.02
W0511 15:53:14.138059 27377 nvc_info.c:303] missing library libvdpau_nvidia.so
W0511 15:53:14.138076 27377 nvc_info.c:307] missing compat32 library libnvidia-ml.so
W0511 15:53:14.138088 27377 nvc_info.c:307] missing compat32 library libnvidia-cfg.so
W0511 15:53:14.138098 27377 nvc_info.c:307] missing compat32 library libcuda.so
W0511 15:53:14.138108 27377 nvc_info.c:307] missing compat32 library libnvidia-opencl.so
W0511 15:53:14.138124 27377 nvc_info.c:307] missing compat32 library libnvidia-ptxjitcompiler.so
W0511 15:53:14.138148 27377 nvc_info.c:307] missing compat32 library libnvidia-fatbinaryloader.so
W0511 15:53:14.138166 27377 nvc_info.c:307] missing compat32 library libnvidia-compiler.so
W0511 15:53:14.138185 27377 nvc_info.c:307] missing compat32 library libvdpau_nvidia.so
W0511 15:53:14.138205 27377 nvc_info.c:307] missing compat32 library libnvidia-encode.so
W0511 15:53:14.138227 27377 nvc_info.c:307] missing compat32 library libnvidia-opticalflow.so
W0511 15:53:14.138250 27377 nvc_info.c:307] missing compat32 library libnvcuvid.so
W0511 15:53:14.138267 27377 nvc_info.c:307] missing compat32 library libnvidia-eglcore.so
W0511 15:53:14.138287 27377 nvc_info.c:307] missing compat32 library libnvidia-glcore.so
W0511 15:53:14.138308 27377 nvc_info.c:307] missing compat32 library libnvidia-tls.so
W0511 15:53:14.138328 27377 nvc_info.c:307] missing compat32 library libnvidia-glsi.so
W0511 15:53:14.138349 27377 nvc_info.c:307] missing compat32 library libnvidia-fbc.so
W0511 15:53:14.138367 27377 nvc_info.c:307] missing compat32 library libnvidia-ifr.so
W0511 15:53:14.138384 27377 nvc_info.c:307] missing compat32 library libnvidia-rtcore.so
W0511 15:53:14.138405 27377 nvc_info.c:307] missing compat32 library libnvoptix.so
W0511 15:53:14.138426 27377 nvc_info.c:307] missing compat32 library libGLX_nvidia.so
W0511 15:53:14.138444 27377 nvc_info.c:307] missing compat32 library libEGL_nvidia.so
W0511 15:53:14.138468 27377 nvc_info.c:307] missing compat32 library libGLESv2_nvidia.so
W0511 15:53:14.138491 27377 nvc_info.c:307] missing compat32 library libGLESv1_CM_nvidia.so
W0511 15:53:14.138511 27377 nvc_info.c:307] missing compat32 library libnvidia-glvkspirv.so
W0511 15:53:14.138531 27377 nvc_info.c:307] missing compat32 library libnvidia-cbl.so
I0511 15:53:14.140096 27377 nvc_info.c:233] selecting /usr/bin/nvidia-smi
I0511 15:53:14.140154 27377 nvc_info.c:233] selecting /usr/bin/nvidia-debugdump
I0511 15:53:14.140212 27377 nvc_info.c:233] selecting /usr/bin/nvidia-persistenced
I0511 15:53:14.140269 27377 nvc_info.c:233] selecting /usr/bin/nvidia-cuda-mps-control
I0511 15:53:14.140324 27377 nvc_info.c:233] selecting /usr/bin/nvidia-cuda-mps-server
I0511 15:53:14.140395 27377 nvc_info.c:370] listing device /dev/nvidiactl
I0511 15:53:14.140415 27377 nvc_info.c:370] listing device /dev/nvidia-uvm
I0511 15:53:14.140432 27377 nvc_info.c:370] listing device /dev/nvidia-uvm-tools
I0511 15:53:14.140449 27377 nvc_info.c:370] listing device /dev/nvidia-modeset
I0511 15:53:14.140520 27377 nvc_info.c:274] listing ipc /run/nvidia-persistenced/socket
W0511 15:53:14.140573 27377 nvc_info.c:278] missing ipc /tmp/nvidia-mps
I0511 15:53:14.140594 27377 nvc_info.c:494] requesting device information with ''
I0511 15:53:14.147767 27377 nvc_info.c:524] listing device /dev/nvidia0 (GPU-23fcb2ab-a6c2-b9e3-f455-6bf92a57b371 at 00000000:03:00.0)
NVRM version:   440.33.01
CUDA version:   10.2

Device Index:   0
Device Minor:   0
Model:          GeForce 920MX
Brand:          GeForce
GPU UUID:       GPU-23fcb2ab-a6c2-b9e3-f455-6bf92a57b371
Bus Location:   00000000:03:00.0
Architecture:   5.0
I0511 15:53:14.147861 27377 nvc.c:318] shutting down library context
I0511 15:53:14.148492 27379 driver.c:192] terminating driver service
I0511 15:53:14.234076 27377 driver.c:233] driver service terminated successfully

kernel version

Linux hema 5.3.0-51-generic NVIDIA/nvidia-docker#44~18.04.2-Ubuntu SMP Thu Apr 23 14:27:18 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

nvidia-smi -a


==============NVSMI LOG==============

Timestamp                           : Mon May 11 17:55:40 2020
Driver Version                      : 440.33.01
CUDA Version                        : 10.2

Attached GPUs                       : 1
GPU 00000000:03:00.0
    Product Name                    : GeForce 920MX
    Product Brand                   : GeForce
    Display Mode                    : Disabled
    Display Active                  : Disabled
    Persistence Mode                : Enabled
    Accounting Mode                 : Disabled
    Accounting Mode Buffer Size     : 4000
    Driver Model
        Current                     : N/A
        Pending                     : N/A
    Serial Number                   : N/A
    GPU UUID                        : GPU-23fcb2ab-a6c2-b9e3-f455-6bf92a57b371
    Minor Number                    : 0
    VBIOS Version                   : 82.08.5A.00.0D
    MultiGPU Board                  : No
    Board ID                        : 0x300
    GPU Part Number                 : N/A
    Inforom Version
        Image Version               : N/A
        OEM Object                  : N/A
        ECC Object                  : N/A
        Power Management Object     : N/A
    GPU Operation Mode
        Current                     : N/A
        Pending                     : N/A
    GPU Virtualization Mode
        Virtualization Mode         : None
        Host VGPU Mode              : N/A
    IBMNPU
        Relaxed Ordering Mode       : N/A
    PCI
        Bus                         : 0x03
        Device                      : 0x00
        Domain                      : 0x0000
        Device Id                   : 0x134F10DE
        Bus Id                      : 00000000:03:00.0
        Sub System Id               : 0x39F117AA
        GPU Link Info
            PCIe Generation
                Max                 : 3
                Current             : 3
            Link Width
                Max                 : 4x
                Current             : 4x
        Bridge Chip
            Type                    : N/A
            Firmware                : N/A
        Replays Since Reset         : 0
        Replay Number Rollovers     : 0
        Tx Throughput               : 493000 KB/s
        Rx Throughput               : 3000 KB/s
    Fan Speed                       : N/A
    Performance State               : P0
    Clocks Throttle Reasons
        Idle                        : Not Active
        Applications Clocks Setting : Not Active
        SW Power Cap                : Not Active
        HW Slowdown                 : Not Active
            HW Thermal Slowdown     : N/A
            HW Power Brake Slowdown : N/A
        Sync Boost                  : Not Active
        SW Thermal Slowdown         : Not Active
        Display Clock Setting       : Not Active
    FB Memory Usage
        Total                       : 2004 MiB
        Used                        : 870 MiB
        Free                        : 1134 MiB
    BAR1 Memory Usage
        Total                       : 256 MiB
        Used                        : 3 MiB
        Free                        : 253 MiB
    Compute Mode                    : Default
    Utilization
        Gpu                         : 0 %
        Memory                      : 0 %
        Encoder                     : N/A
        Decoder                     : N/A
    Encoder Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    FBC Stats
        Active Sessions             : 0
        Average FPS                 : 0
        Average Latency             : 0
    Ecc Mode
        Current                     : N/A
        Pending                     : N/A
    ECC Errors
        Volatile
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
        Aggregate
            Single Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
            Double Bit            
                Device Memory       : N/A
                Register File       : N/A
                L1 Cache            : N/A
                L2 Cache            : N/A
                Texture Memory      : N/A
                Texture Shared      : N/A
                CBU                 : N/A
                Total               : N/A
    Retired Pages
        Single Bit ECC              : N/A
        Double Bit ECC              : N/A
        Pending Page Blacklist      : N/A
    Temperature
        GPU Current Temp            : 41 C
        GPU Shutdown Temp           : 99 C
        GPU Slowdown Temp           : 94 C
        GPU Max Operating Temp      : 98 C
        Memory Current Temp         : N/A
        Memory Max Operating Temp   : N/A
    Power Readings
        Power Management            : N/A
        Power Draw                  : N/A
        Power Limit                 : N/A
        Default Power Limit         : N/A
        Enforced Power Limit        : N/A
        Min Power Limit             : N/A
        Max Power Limit             : N/A
    Clocks
        Graphics                    : 993 MHz
        SM                          : 993 MHz
        Memory                      : 900 MHz
        Video                       : 973 MHz
    Applications Clocks
        Graphics                    : 967 MHz
        Memory                      : 900 MHz
    Default Applications Clocks
        Graphics                    : 965 MHz
        Memory                      : 900 MHz
    Max Clocks
        Graphics                    : 993 MHz
        SM                          : 993 MHz
        Memory                      : 900 MHz
        Video                       : 973 MHz
    Max Customer Boost Clocks
        Graphics                    : N/A
    Clock Policy
        Auto Boost                  : N/A
        Auto Boost Default          : N/A
    Processes
        Process ID                  : 1347
            Type                    : G
            Name                    : /usr/lib/xorg/Xorg
            Used GPU Memory         : 34 MiB
        Process ID                  : 1655
            Type                    : G
            Name                    : /usr/bin/gnome-shell
            Used GPU Memory         : 76 MiB
        Process ID                  : 2765
            Type                    : G
            Name                    : /usr/lib/xorg/Xorg
            Used GPU Memory         : 184 MiB
        Process ID                  : 2951
            Type                    : G
            Name                    : /usr/bin/gnome-shell
            Used GPU Memory         : 273 MiB
        Process ID                  : 3830
            Type                    : G
            Name                    : /opt/google/chrome/chrome --type=gpu-process --field-trial-handle=17903442744480519122,5081937925041455948,131072 --gpu-preferences=MAAAAAAAAAAgAAAAAAAAAAAAAAAAAAAAAABgAAAAAAAQAAAAAAAAAAAAAAAAAAAACAAAAAAAAAA= --shared-files
            Used GPU Memory         : 292 MiB

docker version

Client: Docker Engine - Community
 Version:           19.03.8
 API version:       1.40
 Go version:        go1.12.17
 Git commit:        afacb8b7f0
 Built:             Wed Mar 11 01:25:46 2020
 OS/Arch:           linux/amd64
 Experimental:      false

Server: Docker Engine - Community
 Engine:
  Version:          19.03.8
  API version:      1.40 (minimum version 1.12)
  Go version:       go1.12.17
  Git commit:       afacb8b7f0
  Built:            Wed Mar 11 01:24:19 2020
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.2.13
  GitCommit:        7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc:
  Version:          1.0.0-rc10
  GitCommit:        dc9208a3303feef5b3839f4323d9beb36df0a9dd
 docker-init:
  Version:          0.18.0
  GitCommit:        fec3683

nvidia packages

Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name                                           Version                      Architecture                 Description
+++-==============================================-============================-============================-=================================================================================================
un  libgldispatch0-nvidia                          <none>                       <none>                       (no description available)
ii  libnvidia-cfg1-440:amd64                       440.33.01-0ubuntu1           amd64                        NVIDIA binary OpenGL/GLX configuration library
un  libnvidia-cfg1-any                             <none>                       <none>                       (no description available)
un  libnvidia-common                               <none>                       <none>                       (no description available)
ii  libnvidia-common-440                           440.82-0ubuntu0~0.18.04.1    all                          Shared files used by the NVIDIA libraries
rc  libnvidia-compute-435:amd64                    435.21-0ubuntu0.18.04.2      amd64                        NVIDIA libcompute package
ii  libnvidia-compute-440:amd64                    440.33.01-0ubuntu1           amd64                        NVIDIA libcompute package
ii  libnvidia-container-tools                      1.0.7-1                      amd64                        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64                     1.0.7-1                      amd64                        NVIDIA container runtime library
un  libnvidia-decode                               <none>                       <none>                       (no description available)
ii  libnvidia-decode-440:amd64                     440.33.01-0ubuntu1           amd64                        NVIDIA Video Decoding runtime libraries
un  libnvidia-encode                               <none>                       <none>                       (no description available)
ii  libnvidia-encode-440:amd64                     440.33.01-0ubuntu1           amd64                        NVENC Video Encoding runtime library
un  libnvidia-fbc1                                 <none>                       <none>                       (no description available)
ii  libnvidia-fbc1-440:amd64                       440.33.01-0ubuntu1           amd64                        NVIDIA OpenGL-based Framebuffer Capture runtime library
un  libnvidia-gl                                   <none>                       <none>                       (no description available)
ii  libnvidia-gl-440:amd64                         440.33.01-0ubuntu1           amd64                        NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan ICD
un  libnvidia-ifr1                                 <none>                       <none>                       (no description available)
ii  libnvidia-ifr1-440:amd64                       440.33.01-0ubuntu1           amd64                        NVIDIA OpenGL-based Inband Frame Readback runtime library
un  libnvidia-ml1                                  <none>                       <none>                       (no description available)
un  nvidia-304                                     <none>                       <none>                       (no description available)
un  nvidia-340                                     <none>                       <none>                       (no description available)
un  nvidia-384                                     <none>                       <none>                       (no description available)
un  nvidia-390                                     <none>                       <none>                       (no description available)
un  nvidia-common                                  <none>                       <none>                       (no description available)
rc  nvidia-compute-utils-435                       435.21-0ubuntu0.18.04.2      amd64                        NVIDIA compute utilities
ii  nvidia-compute-utils-440                       440.33.01-0ubuntu1           amd64                        NVIDIA compute utilities
ii  nvidia-container-runtime                       3.1.4-1                      amd64                        NVIDIA container runtime
un  nvidia-container-runtime-hook                  <none>                       <none>                       (no description available)
ii  nvidia-container-toolkit                       1.0.5-1                      amd64                        NVIDIA container runtime hook
rc  nvidia-dkms-435                                435.21-0ubuntu0.18.04.2      amd64                        NVIDIA DKMS package
ii  nvidia-dkms-440                                440.33.01-0ubuntu1           amd64                        NVIDIA DKMS package
un  nvidia-dkms-kernel                             <none>                       <none>                       (no description available)
un  nvidia-docker                                  <none>                       <none>                       (no description available)
rc  nvidia-docker2                                 2.2.2-1                      all                          nvidia-docker CLI wrapper
ii  nvidia-driver-440                              440.33.01-0ubuntu1           amd64                        NVIDIA driver metapackage
un  nvidia-driver-binary                           <none>                       <none>                       (no description available)
un  nvidia-kernel-common                           <none>                       <none>                       (no description available)
rc  nvidia-kernel-common-435                       435.21-0ubuntu0.18.04.2      amd64                        Shared files used with the kernel module
ii  nvidia-kernel-common-440                       440.33.01-0ubuntu1           amd64                        Shared files used with the kernel module
un  nvidia-kernel-source                           <none>                       <none>                       (no description available)
un  nvidia-kernel-source-435                       <none>                       <none>                       (no description available)
ii  nvidia-kernel-source-440                       440.33.01-0ubuntu1           amd64                        NVIDIA kernel source package
un  nvidia-legacy-304xx-vdpau-driver               <none>                       <none>                       (no description available)
un  nvidia-legacy-340xx-vdpau-driver               <none>                       <none>                       (no description available)
un  nvidia-libopencl1-dev                          <none>                       <none>                       (no description available)
ii  nvidia-modprobe                                440.33.01-0ubuntu1           amd64                        Load the NVIDIA kernel driver and create device files
un  nvidia-opencl-icd                              <none>                       <none>                       (no description available)
un  nvidia-persistenced                            <none>                       <none>                       (no description available)
ii  nvidia-prime                                   0.8.8.2                      all                          Tools to enable NVIDIA's Prime
ii  nvidia-settings                                440.64-0ubuntu0~0.18.04.1    amd64                        Tool for configuring the NVIDIA graphics driver
un  nvidia-settings-binary                         <none>                       <none>                       (no description available)
un  nvidia-smi                                     <none>                       <none>                       (no description available)
un  nvidia-utils                                   <none>                       <none>                       (no description available)
ii  nvidia-utils-440                               440.33.01-0ubuntu1           amd64                        NVIDIA driver support binaries
un  nvidia-vdpau-driver                            <none>                       <none>                       (no description available)
ii  xserver-xorg-video-nvidia-440                  440.33.01-0ubuntu1           amd64                        NVIDIA binary Xorg driver
elliothe commented 4 years ago

Meet the same problem, any solutions?

HemaZ commented 4 years ago

@elliothe i have uninstalled CUDA, Nvidia Drivers, nvidia docker and docker. Then installed everything again from scratch. This solved the problem for me

elliothe commented 4 years ago

@HemaZ Thanks for the solution. I may do the same if I have no alternative ways.

albin3 commented 4 years ago

Got same problem, fixed by run sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit

OS/docker info:

$ dockerd --version
Docker version 19.03.8, build afacb8b7f0
$ uname -a
Linux x 5.3.0-53-generic NVIDIA/nvidia-docker#47~18.04.1-Ubuntu SMP Thu May 7 13:10:50 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
7kemZmani commented 4 years ago

@albin3 didn't fix it for me .. I followed all instructions in https://developer.nvidia.com/blog/announcing-cuda-on-windows-subsystem-for-linux-2 yet still seeing:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:297: applying cgroup configuration for process caused \"mountpoint for devices not found\"": unknown.

Geobm commented 4 years ago

Same issue here. Trying to run this repository´s demo but I got the following error $ docker-compose up

ERROR: for vehicle_counting  Cannot start service vehicle_counting: OCI runtime create failed: container_linux.go:349: starting 
container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 cause 
\\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request\\\\n\\\"\"": unknown

ERROR: for vehicle_counting  Cannot start service vehicle_counting: OCI runtime create failed: container_linux.go:349: starting 
container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 1 caused 
\\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process 
request\\\\n\\\"\"": unknown

Tried to instlal nvidia-toolkit as suggested in here but still not working.

Here's my $ docker info output

Client:
 Debug Mode: false

Server:
 Containers: 3
  Running: 0
  Paused: 0
  Stopped: 3
 Images: 7
 Server Version: 19.03.12
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: inactive
 Runtimes: nvidia runc
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: 7ad184331fa3e55e52b890ea95e65ba581ae3429
 runc version: dc9208a3303feef5b3839f4323d9beb36df0a9dd
 init version: fec3683
 Security Options:
  apparmor
  seccomp
   Profile: default
 Kernel Version: 5.4.0-42-generic
 Operating System: Ubuntu 20.04.1 LTS
 OSType: linux
 Architecture: x86_64
 CPUs: 1
 Total Memory: 3.844GiB
 Name: geo-vbox
 ID: PLLH:2H5F:NGLW:52TT:2Q77:AUHV:S3PX:3THU:XIEA:NYMX:FEYD:E2AT
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No swap limit support

Any idea how to solve it?

EnricoBeltramo commented 4 years ago

Same issue. I working with docker version Docker version 19.03.12, build 48a66213fe inside wsl 2 emulation for win10

tidlick1 commented 4 years ago

I also have the same problem, working with Docker version 19.03.12inside WSL 2 emulation for Windows 10. Kernal Version: 4.19.121-microsoft-standard.

paldana-ISI commented 4 years ago

Having same issue with AGX Xavier: https://github.com/NVIDIA/nvidia-container-toolkit/issues/183

olkham commented 4 years ago

Exact same issue here. Followed nvidia guide

Window 10 version 1909 build 18363.1049 Docker version 19.03.12 WSL2 Ubuntu 18.04 and 20.04 Kernal Version: 4.19.121-microsoft-standard Windows nvidia drivers 455.41 CUDA 11.1

The output of nvidia-container-cli -k -d /dev/tty info

I0821 16:21:57.950311 5686 nvc.c:282] initializing library context (version=1.3.0, build=af0220ff5c503d9ac6a1b5a491918229edbb37a4)
I0821 16:21:57.950354 5686 nvc.c:256] using root /
I0821 16:21:57.950358 5686 nvc.c:257] using ldcache /etc/ld.so.cache
I0821 16:21:57.950376 5686 nvc.c:258] using unprivileged user 1000:1000
I0821 16:21:57.950389 5686 nvc.c:299] attempting to load dxcore to see if we are running under Windows Subsystem for Linux (WSL)
I0821 16:21:57.950454 5686 nvc.c:301] dxcore initialization failed, continuing assuming a non-WSL environment
W0821 16:21:57.950514 5686 nvc.c:172] failed to detect NVIDIA devices
W0821 16:21:57.950641 5687 nvc.c:187] failed to set inheritable capabilities
W0821 16:21:57.950680 5687 nvc.c:188] skipping kernel modules load due to failure
I0821 16:21:57.950836 5688 driver.c:101] starting driver service
E0821 16:21:57.950966 5688 driver.c:161] could not start driver service: load library failed: libnvidia-ml.so.1: cannot open shared object file: no such file or directory
I0821 16:21:57.951083 5686 driver.c:196] driver service terminated successfully
nvidia-container-cli: initialization error: driver error: failed to process request
Agrover112 commented 4 years ago

Same here stuck

LianQi-Kevin commented 4 years ago

Same here stuck

klueska commented 4 years ago

The original issue described here, that has an error of:

nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.1, please update your driver to a newer version, or use an earlier cuda container

Is due to the fact that the original poster had an NVIDIA driver that was too old to run CUDA 10.1.

The poster acknowledged this and closed the issue on March 21st. https://github.com/NVIDIA/nvidia-docker/issues/1225#issuecomment-601990042

Since that time, this issue has been reopened and commented on many times with unrelated error messages.

Since the original issue was resolved, I am going to close this issue again, and encourage you to open a new issue if you are still having problems with different errors.

Tony363 commented 4 years ago

https://ngc.nvidia.com/catalog/containers/nvidia:l4t-base

try using this base image. it solved all my problems to jetson tegra arm64 architecture issues and now I can seamlessly docker pull and use my docker images across jetson tegra devices

klueska commented 4 years ago

Anytime nvidia docker fails you will see an error that begins with:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: ...

This part of the error message is output by docker itself, and is out of our control.

It's the part after stderr: that is relevant to nvidia-docker.

In the original post, this error was:

nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container

@gsss124 is this actually the same error response you were seeing? Given the description of your problem, it seems unlikely.

klueska commented 4 years ago

In any case, I would recommend performing your step 4 using docker's daemon.json file instead of editing the docker service directly:

$ cat /etc/docker/daemon.json
{
    "data-root": "/your/custom/location",
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}
gsss124 commented 4 years ago

Anytime nvidia docker fails you will see an error that begins with:

docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: ...

This part of the error message is output by docker itself, and is out of our control.

It's the part after stderr: that is relevant to nvidia-docker.

In the original post, this error was:

nvidia-container-cli: requirement error: unsatisfied condition: cuda>=10.0, please update your driver to a newer version, or use an earlier cuda container

@gsss124 is this actually the same error response you were seeing? Given the description of your problem, it seems unlikely.

Thanks for the reply. This was not the error, it was only related to OCI. Now docker info gives the custom data-root location, but to my surprise it is still using the system drive as I see a reduction in space available on the system drive and space available is same in my custom data-root drive. So, I will delete my reply above.

gsss124 commented 4 years ago

In any case, I would recommend performing your step 4 using docker's daemon.json file instead of editing the docker service directly:

$ cat /etc/docker/daemon.json
{
    "data-root": "/your/custom/location",
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Thanks for this, but I tried this method and it did not work for me. But I will give it a shot again by adding system restart step. I even tried nvidia-container-runtime separately, that didn't work. After editing docker.service, it gave me data-root as a my custom location but still using the /var/lib/docker location to store data! I don't understand what is happening.

gsss124 commented 4 years ago

In any case, I would recommend performing your step 4 using docker's daemon.json file instead of editing the docker service directly:

$ cat /etc/docker/daemon.json
{
    "data-root": "/your/custom/location",
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

Thanks for this, but I tried this method and it did not work for me. But I will give it a shot again by adding system restart step. I even tried nvidia-container-runtime separately, that didn't work. After editing docker.service, it gave me data-root as a my custom location but still using the /var/lib/docker location to store data! I don't understand what is happening.

To my horror, it has created a new drive taking a part of space of system drive, named it to my custom data-root name and renamed my old drive! It's not using /var/lib/docker, but a part of it renamed to my custom data-root name.

wanfuse123 commented 4 years ago

sudo service docker start

sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\\n\\"\"": unknown.

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\\n\\"\"": unknown.

ldconfig -p | grep cuda libicudata.so.66 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicudata.so.66 libcuda.so.1 (libc6,x86-64) => /usr/lib/wsl/lib/libcuda.so.1

ls -al /usr/lib/wsl/lib total 70792 dr-xr-xr-x 1 root root 512 Sep 18 15:53 . drwxr-xr-x 4 root root 4096 Sep 18 12:28 .. -r--r--r-- 1 root root 124664 Aug 30 09:51 libcuda.so -r--r--r-- 2 root root 832936 Sep 12 08:44 libd3d12.so -r--r--r-- 2 root root 5073944 Sep 12 08:44 libd3d12core.so -r--r--r-- 2 root root 25069816 Sep 12 08:44 libdirectml.so -r--r--r-- 2 root root 878768 Sep 12 08:44 libdxcore.so -r--r--r-- 1 root root 40496936 Aug 30 09:51 libnvwgf2umx.so

sudo ln -s /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1 ln: failed to create symbolic link '/usr/lib/wsl/lib/libcuda.so.1': Read-only file system

seems as if its missing and the video driver is still required, unless there is something that can make it appear at location

ls: cannot access '/usr/lib/wsl/lib/libcuda.so.1'

any thoughts?

From my understanding putting the video driver is no longer required in docker -- ubuntu guest

wanfuse123 commented 4 years ago

Directory of C:\Windows\System32\lxss\lib

09/18/2020 03:53 PM

. 08/30/2020 09:51 AM 124,664 libcuda.so 09/12/2020 08:44 AM 832,936 libd3d12.so 09/12/2020 08:44 AM 5,073,944 libd3d12core.so 09/12/2020 08:44 AM 25,069,816 libdirectml.so 09/12/2020 08:44 AM 878,768 libdxcore.so 08/30/2020 09:51 AM 40,496,936 libnvwgf2umx.so 6 File(s) 72,477,064 bytes 1 Dir(s) 643,723,309,056 bytes free

C:\Windows\System32\lxss\lib>mklink libcuda.so.1 libcuda.so symbolic link created for libcuda.so.1 <<===>> libcuda.so

still no work, but seems closer

wanfuse123 commented 4 years ago

more info docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark 18 sudo find / -iname /usr/lib/wsl/lib/libcuda.so.1 19 sudo find / -iname libcuda.so.1 20 ldconfig -p | grep cuda 21 ls /usr/lib/wsl/lib/libcuda.so.1 22 #sudo ls -d libcuda.so.1 23 cd / 24 sudo ls -d libcuda.so.1 25 ls -al /usr/lib/wsl 26 ls -al /usr/lib/wsl/drivers 27 ls -al /usr/lib/wsl/drivers | grep -i libcuda* 28 ls -al /usr/lib/wsl/ 29 ls -al /usr/lib/wsl/lib 30 sudo ln -s /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1 31 sudo ln -s /usr/lib/wsl/lib/libcuda.so.1 /usr/lib/wsl/lib/libcuda.so 32 sudo ln -s /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1 33 echo $LD_LIBRARY_PATH 34 sudo apt install nvidia-361-dev 35 nvidia-smi 36 sudo apt isntall nvidia-utils-435 37 sudo apt install nvidia-utils-435 38 cd %SYSTEMROOT%\System32\lxss\lib 39 cd %SYSTEMROOT%\ 40 cd %SYSTEMROOT% 41 ls 42 ls /usr/lib/wsl/lib/ 43 ls -al /usr/lib/wsl/lib/ 44 docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark 45 sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi 46 sudo apt-remove nvidia-docker2 47 sudo apt-get remove nvidia-docker2 48 sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi 49 docker run --rm --privileged nvidia/cuda nvidia-smi 50 nvidia-docker run --rm nvidia/cuda nvidia-smi 51 nvidia-docker run --rm --privileged nvidia/cuda nvidia-smi 52 docker run --rm --privileged nvidia/cuda nvidia-smi 53 nvidia-smi 54 sudo apt-get install nvidia-docker2 55 nvidia-docker run --rm --privileged nvidia/cuda nvidia-smi 56 docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark 57 docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark -compare 58 nvcc --version 59 sudo apt-get install nvidia-cuda-toolkit 60 docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark -compare

gsss124 commented 4 years ago

In any case, I would recommend performing your step 4 using docker's daemon.json file instead of editing the docker service directly:

$ cat /etc/docker/daemon.json
{
    "data-root": "/your/custom/location",
    "default-runtime": "nvidia",
    "runtimes": {
        "nvidia": {
            "path": "/usr/bin/nvidia-container-runtime",
            "runtimeArgs": []
        }
    }
}

I tried this again by editing /etc/docker/daemon.json and got the following stderr: nvidia-container-cli: ldcache error: process /sbin/ldconfig.real failed with error code: 1\\\\n\\\"\""

Full output: docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: ldcache error: process /sbin/ldconfig.real failed with error code: 1\\\\n\\\"\"": unknown

docker info now displays the required custom directory and space is reduced in the right directory. Now it is ldcache error. I checked here but my seccomp output is YES:

>>cat /boot/config-$(uname -r) | grep -i seccomp Output: CONFIG_SECCOMP=y CONFIG_HAVE_ARCH_SECCOMP_FILTER=y CONFIG_SECCOMP_FILTER=y

Please suggest what might be the problem.

gsss124 commented 4 years ago

sudo service docker start

* Starting Docker: docker                                                                                       [ OK ]

sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\n\""": unknown.

docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused "process_linux.go:432: running prestart hook 0 caused \"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\n\""": unknown.

ldconfig -p | grep cuda libicudata.so.66 (libc6,x86-64) => /lib/x86_64-linux-gnu/libicudata.so.66 libcuda.so.1 (libc6,x86-64) => /usr/lib/wsl/lib/libcuda.so.1

ls -al /usr/lib/wsl/lib total 70792 dr-xr-xr-x 1 root root 512 Sep 18 15:53 . drwxr-xr-x 4 root root 4096 Sep 18 12:28 .. -r--r--r-- 1 root root 124664 Aug 30 09:51 libcuda.so -r--r--r-- 2 root root 832936 Sep 12 08:44 libd3d12.so -r--r--r-- 2 root root 5073944 Sep 12 08:44 libd3d12core.so -r--r--r-- 2 root root 25069816 Sep 12 08:44 libdirectml.so -r--r--r-- 2 root root 878768 Sep 12 08:44 libdxcore.so -r--r--r-- 1 root root 40496936 Aug 30 09:51 libnvwgf2umx.so

sudo ln -s /usr/lib/wsl/lib/libcuda.so /usr/lib/wsl/lib/libcuda.so.1 ln: failed to create symbolic link '/usr/lib/wsl/lib/libcuda.so.1': Read-only file system

seems as if its missing and the video driver is still required, unless there is something that can make it appear at location

ls: cannot access '/usr/lib/wsl/lib/libcuda.so.1'

any thoughts?

From my understanding putting the video driver is no longer required in docker -- ubuntu guest

Are you using a virtual machine? As stated by @klueska output after stderr is of interest. Your error says stderr: nvidia-container-cli: mount error: stat failed: /usr/lib/wsl/lib/libcuda.so.1: no such file or directory\\n\""": unknown. Something related to nvidia-driver not being available where required.

klueska commented 4 years ago

@wanfuse123 please file a new issue if you need help debugging this. Your issue looks unrelated to the one here (especially since it seems you are running on Windows, and not linux).

chauncygu commented 4 years ago

@tytcc I also faced the same problem on ubunut16.04 machine. I have the latest driver 440.64.00 installed and now i tried to run example docker run --gpus all nvidia/cuda:10.0-base nvidia-smi i get this error docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\\\n\\\"\"": unknown. ERRO[0001] error waiting for container: context canceled

I also face the problem, did you solve it?

wanfuse123 commented 4 years ago

nvidia-smi does not work under wsl2 as of right now. Use the following test instead

"medium.com" + "how-to-use-nvidia-gpu-in-docker-to-run-tensorflow"

use their

On Sat, Oct 17, 2020 at 10:42 PM chauncygu notifications@github.com wrote:

@tytcc https://github.com/tytcc I also faced the same problem on ubunut16.04 machine. I have the latest driver 440.64.00 installed and now i tried to run example docker run --gpus all nvidia/cuda:10.0-base nvidia-smi i get this error docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\n\\"\"": unknown. ERRO[0001] error waiting for container: context canceled

I also face the problem, did you solve it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/nvidia-docker/issues/1225#issuecomment-711108195, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDBZWO35LJAFRGXOXF4SXLSLJIY3ANCNFSM4LQL2LDA .

wanfuse123 commented 4 years ago

sorry got cut off. Use their testing examples and that container. It costs 5 bucks for access but I thought it was worth it for one year access. ( NOTE I have nothing to do with their site. I just spent the five bucks for it)

anyway use their testing examples.

You can't use "nvidia-smi" it is not working right now in the containers. Apparently nvidia and microsoft are working hard on the problem

On Sat, Oct 17, 2020 at 11:00 PM Steven Anderson wanfuse123@gmail.com wrote:

nvidia-smi does not work under wsl2 as of right now. Use the following test instead

"medium.com" + "how-to-use-nvidia-gpu-in-docker-to-run-tensorflow"

use their

On Sat, Oct 17, 2020 at 10:42 PM chauncygu notifications@github.com wrote:

@tytcc https://github.com/tytcc I also faced the same problem on ubunut16.04 machine. I have the latest driver 440.64.00 installed and now i tried to run example docker run --gpus all nvidia/cuda:10.0-base nvidia-smi i get this error docker: Error response from daemon: OCI runtime create failed: container_linux.go:349: starting container process caused "process_linux.go:449: container init caused \"process_linux.go:432: running prestart hook 0 caused \\"error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: cuda error: unknown error\\n\\"\"": unknown. ERRO[0001] error waiting for container: context canceled

I also face the problem, did you solve it?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/NVIDIA/nvidia-docker/issues/1225#issuecomment-711108195, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADDBZWO35LJAFRGXOXF4SXLSLJIY3ANCNFSM4LQL2LDA .

wanfuse123 commented 4 years ago

update on the medium link, look at the comments I have made an updated script that runs a simple test.

luogyu7 commented 3 years ago

@elliothe i have uninstalled CUDA, Nvidia Drivers, nvidia docker and docker. Then installed everything again from scratch. This solved the problem for me

I think it is right. It work for me when I uninstall Nvidia Driver(version 460): thanks.

iser@iser:~$ sudo apt-get purge nvidia 正在读取软件包列表... 完成 正在分析软件包的依赖关系树
正在读取状态信息... 完成
注意,根据Glob 'nvidia
' 选中了 'nvidia-kernel-common-418-server' 注意,根据Glob 'nvidia' 选中了 'nvidia-325-updates' 注意,根据Glob 'nvidia' 选中了 'nvidia-346-updates' 注意,根据Glob 'nvidia' 选中了 'nvidia-driver-binary' 注意,根据Glob 'nvidia' 选中了 'nvidia-331-dev' 注意,根据Glob 'nvidia' 选中了 'nvidia-304-updates-dev' 注意,根据Glob 'nvidia' 选中了 'nvidia-compute-utils-418-server' 注意,根据Glob 'nvidia' 选中了 'nvidia-384-dev' 注意,根据Glob 'nvidia' 选中了 'nvidia-docker2' 注意,根据Glob 'nvidia' 选中了 'nvidia-libopencl1-346-updates' 注意,根据Glob 'nvidia' 选中了 'nvidia-driver-440-server' 注意,根据Glob 'nvidia*' 选中了 'nvidia-340-updates-uvm'

-------following is the note of installing successfully.----------

Adding group iser' (GID 1000) ... Done. Adding useriser' ... Adding new user iser' (1000) with groupiser' ... Creating home directory /home/iser' ... Copying files from/etc/skel' ... [ OK ] Congratulations! You have successfully finished setting up Apollo Dev Environment. [ OK ] To login into the newly created apollo_dev_iser container, please run the following command: [ OK ] bash docker/scripts/dev_into.sh [ OK ] Enjoy!

elasticdotventures commented 3 years ago

Why is this issue marked as closed? @luogyu7 says "it worked for me" because they uninstalled an ancient version and replaced it with an updated one? ..

My configuration: Ubuntu 20 WSL2 on Windows 10, Docker works properly, no issues starting other non-cuda containers.

I ran the command suggested earlier by @lougyu7

$ sudo apt-get purge nvidia*

Then attempted to reinstall the nvidia-cuda-toolkit and now we're here:

$ uname -a 
Linux COMMODORE387 5.4.72-microsoft-standard-WSL2 NVIDIA/nvidia-docker#1 SMP Wed Oct 28 23:40:43 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

$ apt-get install nvidia-cuda-toolkit
Reading package lists... Done
Building dependency tree
Reading state information... Done
Some packages could not be installed. This may mean that you have
requested an impossible situation or if you are using the unstable
distribution that some required packages have not yet been created
or been moved out of Incoming.
The following information may help to resolve the situation:

The following packages have unmet dependencies:
 nvidia-cuda-toolkit : Depends: nvidia-cuda-dev (= 10.1.243-3) but it is not going to be installed
E: Unable to correct problems, you have held broken packages.

# docker run --gpus all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark -compare
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
ERRO[0001] error waiting for container: context canceled

 run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi
Unable to find image 'nvidia/cuda:11.0-base' locally

Digest: sha256:774ca3d612de15213102c2dbbba55df44dc5cf9870ca2be6c6e9c627fa63d67a
Status: Downloaded newer image for nvidia/cuda:11.0-base
docker: Error response from daemon: OCI runtime create failed: container_linux.go:370: starting container process caused: process_linux.go:459: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown.
klueska commented 3 years ago

I can't help you with the unmet dependencies error for the nvidia-cuda-toolkit component (that is something independent of the container stack, and (for all intents and purposes) unnecessary for you to install on your host if you only ever plan to run cuda applications in containers.

I think the package you were intending to install is nvidia-container-toolkit -- which is the one required for container support.

abaybektursun commented 3 years ago

Restart worked for me

jzhang82119 commented 3 years ago

I am getting this error when running any nvidia-docker command.

docker: Error response from daemon: OCI runtime create failed: container_linux.go:367: starting container process caused: process_linux.go:495: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: detection error: nvml error: unknown error: unknown.

I read through the whole threads, but can only see people fix it by restarting the server or reinstall driver/docker.

I have tried restarting server but it does not resolve this error.

zhendao-chongde commented 3 years ago

解決方法 ステップ1.まずあなたがダウンロードした JetPack SDKのバージョンを調べてください。 jetson-nano-developer-kitを使用している方→ https://developer.nvidia.com/embedded/jetson-nano-developer-kit ステップ2.上のURLページの真ん中らへんにある、「Download jetpack」ボタンをクリック ステップ3.新たなページが出てくるだろう。思い出してほしい。このページからあなたはJetPackをインストールしたはずだ。ここに書かれているJetPack のバージョンをメモしておこう。例)JetPack 4.5.1 ステップ4.Jetson AI Coursesに登録しよう。ここに動画付きの解説方法がある。youtubeとかにも同じ動画が出ているが、このコースのほうが備考欄に詳細が書かれていてわかりやすい。 ステップ5.Jetson AI Coursesの「Download Docker And Start JupyterLab」をあなたは見ているだろう。そしてタイトルにあるようなエラーが出たのでググった、そんなところだろう。 まずはこれをよく見てほしい。 「echo "sudo docker run --runtime nvidia -it --rm --network host \ --volume ~/nvdli-data:/nvdli-nano/data \ --device /dev/video0 \ nvcr.io/nvidia/dli/dli-nano-ai:v2.0.1-r32.4.4" > docker_dli_run.sh」 この最後の行に「-r32.4.4」とある。ここをあなたがダウンロードしたjetpackに合わせる必要がある。 私の場合はJetPack 4.5.1をダウンロードしたので、「-r32.4.5」にすればよい。

ステップ6.あとはJetson AI Coursesの「Download Docker And Start JupyterLab」の解説動画どおりにすればよい。 注意)もしこれでもエラーが出るようならば、間違って先にダウンロードしたDocker のイメージが邪魔をしている可能性が非常に高い。この削除方法は「Docker image削除」とググってほしい。削除したらステップ5に戻って、もう一度トライだ。これでもダメなら、「sudo apt update」とか「sudo apt upgrade」とかやってみてから、邪魔しているDockerイメージの削除からやってみてほしい。もしこれでもダメなら、あの「echo"sudo docker ~」のあとに、ちゃんと「chmod +x docker_dli_run.sh」をしたか思い出してほしい。もしやっていないのなら、やろう。それでもダメだと、う〜ん、あの最後の行に「-r32.4.4」で実は-32ではないパターンか?

mihajenko commented 3 years ago

I managed to solve the problem for me (thank you thread for pointing out nvidia-container-cli).

tl;dr: Check your libnvidia-container1 version.

Fortunately, I had two systems to check. The working system had libnvidia-container1 version 1.4.0, the crashing system had an RC version of 1.5.0!

  1. Check the version
  2. Bump to 1.5.0 (the final, non-RC version)
  3. Reinstall nvidia-docker2

Try running your CUDA container with --gpus all, verify nvidia-smi shows output.

felrock commented 2 years ago

If you have made any recent updates make sure to reboot and it might solve the issue for you.

joberthrogers18 commented 2 years ago

Restart my system operation worked for me like @abaybektursun cite above.

semenoffalex commented 2 years ago

Had the same issue like topickstarter on Ubuntu 20.04 LTS under wsl2 in Windows 10. Resolved it by updating Windows version from 21H1 to 21H2. They say CUDA doesn't work properly on wsl2 with 21H1 update.