NVIDIA / nvidia-docker

Build and run Docker containers leveraging NVIDIA GPUs
Apache License 2.0
17.19k stars 2.03k forks source link

Nvidia CUDA driver does not detect my graphical device in docker #1508

Closed Telluro closed 10 months ago

Telluro commented 3 years ago

1. Issue or feature description

Recently, I ve tried to run docker CUDA on my WSL2 windows instalation ( Microsoft Dev 21390.1) for university project. I ve run into multiple problems featuring instalation of the nvidia CUDA drivers. What is weird at the start is that the command lspci | grep -i nvidia does not return anything. What is more, the only graphical device it does return when running lspci -v is

0912:00:00.0 3D controller: Microsoft Corporation Device 008e
        Physical Slot: 205491325
        Flags: bus master, fast devsel, latency 0
        Capabilities: <access denied>
lspci: Unable to load libkmod resources: error -12

I am new to the docker and nvidia CUDA technology in general so Im probably doing something wrong in the first place. I have tried downgrading several instalations, running my wsl with driver installed/ with only service. Nothing worked and it only produced another errors. Also after every installation my lspci does not contain my graphical device ( GTX 970). I ve tried updating it with update-pciids command and it also did not change anything. I will be thankful for any help provided.

2. Information to attach (optional if deemed irrelevant)

Timestamp : Tue Jun 1 18:27:30 2021 Driver Version : 470.14 CUDA Version : 11.3

Attached GPUs : 1 GPU 00000000:01:00.0 Product Name : NVIDIA GeForce GTX 970 Product Brand : GeForce Display Mode : Enabled Display Active : Enabled Persistence Mode : N/A MIG Mode Current : N/A Pending : N/A Accounting Mode : Disabled Accounting Mode Buffer Size : 4000 Driver Model Current : WDDM Pending : WDDM

 - [ ] Docker version from `docker version`

Client: Docker Engine - Community Version: 20.10.6 API version: 1.41 Go version: go1.13.15 Git commit: 370c289 Built: Fri Apr 9 22:46:01 2021 OS/Arch: linux/amd64 Context: default Experimental: true

Server: Docker Engine - Community Engine: Version: 20.10.6 API version: 1.41 (minimum version 1.12) Go version: go1.13.15 Git commit: 8728dd2 Built: Fri Apr 9 22:44:13 2021 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.4.6 GitCommit: d71fcd7d8303cbf684402823e425e9dd2e99285d runc: Version: 1.0.0-rc95 GitCommit: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7 docker-init: Version: 0.19.0 GitCommit: de40ad0

 - [ ] NVIDIA packages version from `dpkg -l '*nvidia*'``

Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-========================-=================-=================-===================================================== un libgldispatch0-nvidia (no description available) ii libnvidia-cfg1-465:amd64 465.19.01-0ubuntu amd64 NVIDIA binary OpenGL/GLX configuration library un libnvidia-cfg1-any (no description available) un libnvidia-common (no description available) ii libnvidia-common-465 465.19.01-0ubuntu all Shared files used by the NVIDIA libraries ii libnvidia-compute-465:am 465.19.01-0ubuntu amd64 NVIDIA libcompute package ii libnvidia-container-tool 1.4.0-1 amd64 NVIDIA container runtime library (command-line tools) ii libnvidia-container1:amd 1.4.0-1 amd64 NVIDIA container runtime library un libnvidia-decode (no description available) ii libnvidia-decode-465:amd 465.19.01-0ubuntu amd64 NVIDIA Video Decoding runtime libraries un libnvidia-encode (no description available) ii libnvidia-encode-465:amd 465.19.01-0ubuntu amd64 NVENC Video Encoding runtime library un libnvidia-extra (no description available) ii libnvidia-extra-465:amd6 465.19.01-0ubuntu amd64 Extra libraries for the NVIDIA driver un libnvidia-fbc1 (no description available) ii libnvidia-fbc1-465:amd64 465.19.01-0ubuntu amd64 NVIDIA OpenGL-based Framebuffer Capture runtime libra un libnvidia-gl (no description available) ii libnvidia-gl-465:amd64 465.19.01-0ubuntu amd64 NVIDIA OpenGL/GLX/EGL/GLES GLVND libraries and Vulkan un libnvidia-ifr1 (no description available) ii libnvidia-ifr1-465:amd64 465.19.01-0ubuntu amd64 NVIDIA OpenGL-based Inband Frame Readback runtime lib un libnvidia-ml1 (no description available) un nvidia-304 (no description available) un nvidia-340 (no description available) un nvidia-384 (no description available) un nvidia-390 (no description available) ii nvidia-compute-utils-465 465.19.01-0ubuntu amd64 NVIDIA compute utilities ii nvidia-container-runtime 3.5.0-1 amd64 NVIDIA container runtime un nvidia-container-runtime (no description available) ii nvidia-container-toolkit 1.5.0-1 amd64 NVIDIA container runtime hook ii nvidia-dkms-465 465.19.01-0ubuntu amd64 NVIDIA DKMS package un nvidia-dkms-kernel (no description available) un nvidia-docker (no description available) ii nvidia-docker2 2.6.0-1 all nvidia-docker CLI wrapper ii nvidia-driver-465 465.19.01-0ubuntu amd64 NVIDIA driver metapackage un nvidia-driver-binary (no description available) un nvidia-kernel-common (no description available) ii nvidia-kernel-common-465 465.19.01-0ubuntu amd64 Shared files used with the kernel module un nvidia-kernel-source (no description available) ii nvidia-kernel-source-465 465.19.01-0ubuntu amd64 NVIDIA kernel source package un nvidia-legacy-340xx-vdpa (no description available) ii nvidia-modprobe 465.19.01-0ubuntu amd64 Load the NVIDIA kernel driver and create device files un nvidia-opencl-icd (no description available) un nvidia-persistenced (no description available) ii nvidia-prime 0.8.16~0.18.04.1 all Tools to enable NVIDIA's Prime ii nvidia-settings 465.19.01-0ubuntu amd64 Tool for configuring the NVIDIA graphics driver un nvidia-settings-binary (no description available) un nvidia-smi (no description available) un nvidia-utils (no description available) ii nvidia-utils-465 465.19.01-0ubuntu amd64 NVIDIA driver support binaries un nvidia-vdpau-driver (no description available) ii xserver-xorg-video-nvidi 465.19.01-0ubuntu amd64 NVIDIA binary Xorg driver

 - [ ] NVIDIA container library version from `nvidia-container-cli -V`

version: 1.4.0 build date: 2021-04-24T14:25+00:00 build revision: 704a698b7a0ceec07a48e56c37365c741718c2df build compiler: x86_64-linux-gnu-gcc-7 7.5.0 build platform: x86_64 build flags: -D_GNU_SOURCE -D_FORTIFY_SOURCE=2 -DNDEBUG -std=gnu11 -O2 -g -fdata-sections -ffunction-sections -fstack-protector -fno-strict-aliasing -fvisibility=hidden -Wall -Wextra -Wcast-align -Wpointer-arith -Wmissing-prototypes -Wnonnull -Wwrite-strings -Wlogical-op -Wformat=2 -Wmissing-format-attribute -Winit-self -Wshadow -Wstrict-prototypes -Wunreachable-code -Wconversion -Wsign-conversion -Wno-unknown-warning-option -Wno-format-extra-args -Wno-gnu-alignof-expression -Wl,-zrelro -Wl,-znow -Wl,-zdefs -Wl,--gc-sections

 - [ ] Docker command, image and tag used

docker: Error response from daemon: OCI runtime create failed: container_linux.go:380: starting container process caused: process_linux.go:545: container init caused: Running hook #0:: error running hook: exit status 1, stdout: , stderr: nvidia-container-cli: initialization error: driver error: failed to process request: unknown. ERRO[0000] error waiting for container: context canceled

PQLLUX commented 3 years ago

Inside WSL2 you don't need any graphics-related nvidia libraries (eg. nvidia-utils-465) - installing windows drivers from here should do the trick. Firstly I'd purge all nvidia libraries you have inside your WSL distro, update linux kernel from Windows Update menu (after checking option to receive updates to MS related products, pic rel) and build to versions mentioned here, install nvidia drivers in windows and then setup nvidia-docker repositories following guide from the second link, if you haven't already. image Here are nvidia & docker related libraries inside my WSL installation:

➜ dpkg -l | grep nvidia
ii  libnvidia-container-tools                                   1.5.0-1                           amd64        NVIDIA container runtime library (command-line tools)
ii  libnvidia-container1:amd64                                  1.5.0-1                           amd64        NVIDIA container runtime library
ii  nvidia-container-runtime                                    3.5.0-1                           amd64        NVIDIA container runtime
ii  nvidia-container-toolkit                                    1.5.1-1                           amd64        NVIDIA container runtime hook
ii  nvidia-docker2                                              2.6.0-1                           all          nvidia-docker CLI wrapper