NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
2.17k stars 237 forks source link

Getting the following error trying to run an Nvidia/Cuda container in Windows 10: Auto-detected mode as 'legacy' nvidia-container-cli: initialization error: WSL environment detected but no adapters were found: unknown #155

Open dantamont opened 1 year ago

dantamont commented 1 year ago

I can't find info on this error anywhere! I am running Docker Desktop with WSL. My docker compose file looks like this:

version: '3'
services:
    app:
      container_name: "sd"
      build: .
      ports:
      - 8080:8080                        # <LOCAL_PORT>:<CONTAINER_PORT>
      command: nvidia-smi
      tty: true # Make windows happy to keep terminal open
      stdin_open: true # Make windows happy to keep terminal open
      deploy:
        resources:
          reservations:
            devices:
            - driver: nvidia
              capabilities:
                - gpu
                - utility # nvidia-smi
                - compute # CUDA. Required to avoid "CUDA version: N/A"
                - video   # NVDEC/NVENC. For instance to use a hardware accelerated ffmpeg. Skip it if you don't need it

And my docker file is simply:

FROM nvidia/cuda:11.8.0-cudnn8-devel-ubuntu20.04

I don't even know where to start investigating this. Where could things be going wrong? The nvidia-smi command works properly on my host machine, just not within the container.

dantamont commented 1 year ago

Disappointingly, based on this thread, it seems impossible to run CUDA on my GTX 1080, which is very disappointing. Would love to be wrong about this, though! It'd be great if there was a way to set the persistence mode on in Windows 10.

noahheldt commented 1 year ago

Disappointingly, based on this thread, it seems impossible to run CUDA on my GTX 1080, which is very disappointing. Would love to be wrong about this, though! It'd be great if there was a way to set the persistence mode on in Windows 10.

I had the same problem as you and after searching for days I finally found a fix that worked worked for me:

Fix

If you are on Windows 10 you need to upgrade the OS to Version 21H2 according to this thread.

How To

To check on which version you are open a terminal and use winver If you are on 21H1:

After the installation check again with winver that it actually updated, as in the linked thread it is mentioned that the update assistant sometimes incorrectly reports the update even though it hasn't been installed.

Test it

I then tested it in a WSL terminal with sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi.

If that still doesn't work for you, make sure WSL2 and Docker are set up according to the Docker Guide and that the Linux x86 CUDA Toolkit is set up as mentioned in this guide from Nvidia.

Rom1deTroyes commented 1 year ago

Disappointingly, based on this thread, it seems impossible to run CUDA on my GTX 1080, which is very disappointing. Would love to be wrong about this, though! It'd be great if there was a way to set the persistence mode on in Windows 10.

I had the same problem as you and after searching for days I finally found a fix that worked worked for me:

Fix

If you are on Windows 10 you need to upgrade the OS to Version 21H2 according to this thread.

How To

To check on which version you are open a terminal and use winver If you are on 21H1:

* Open Settings

* Go to the Update tab

* Click on show optional updates

* Install _21H2_.

After the installation check again with winver that it actually updated, as in the linked thread it is mentioned that the update assistant sometimes incorrectly reports the update even though it hasn't been installed.

Test it

I then tested it in a WSL terminal with sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi.

If that still doesn't work for you, make sure WSL2 and Docker are set up according to the Docker Guide and that the Linux x86 CUDA Toolkit is set up as mentioned in this guide from Nvidia.

image

❯ sudo docker run --rm --gpus all nvidia/cuda:11.0.3-base-ubuntu20.04 nvidia-smi
Unable to find image 'nvidia/cuda:11.0.3-base-ubuntu20.04' locally
11.0.3-base-ubuntu20.04: Pulling from nvidia/cuda
d7bfe07ed847: Already exists
75eccf561042: Pull complete
191419884744: Pull complete
a17a942db7e1: Pull complete
16156c70987f: Pull complete
Digest: sha256:57455121f3393b7ed9e5a0bc2b046f57ee7187ea9ec562a7d17bf8c97174040d
Status: Downloaded newer image for nvidia/cuda:11.0.3-base-ubuntu20.04
docker: Error response from daemon: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: WSL environment detected but no adapters were found: unknown.
elezar commented 1 year ago

As a note, the v1.13.x release of the NVIDIA Container Toolkit allows for CDI specifications to be generated for WSL2 systems. This allows Podman (>=4.1.0) installed in the WSL2 quest OS to be used to run containers directly instead of relying on a particular version of Docker Desktop.

For example, assuming that the NVIDIA Container Toolkit is installed in the WSL2 guest, running the following command will generate the CDI specification:

sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml

This will create a CDI specification with a single nvidia.com/gpu=all device.

To request this device when running podman use the following:

podman run --rm -ti --device=nvidia.com/gpu=all ubuntu nvidia-smi

Which should show the same nvidia-smi output as in the guest.

Note that work is in progress to add CDI support to docker.

scruel commented 2 months ago

@elezar

➜ sudo LD_LIBRARY_PATH=/usr/lib/wsl/lib nvidia-ctk --debug cdi generate --mode wsl  --output=/etc/cdi/nvidia.yaml
DEBU[0000] Locating NVIDIA Container Toolkit CLI as nvidia-ctk
DEBU[0000] Checking candidate '/usr/bin/nvidia-ctk'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Found nvidia-ctk candidates: [/usr/bin/nvidia-ctk]
DEBU[0000] Using NVIDIA Container Toolkit CLI path /usr/bin/nvidia-ctk
DEBU[0000] Inferred output format as "yaml" from output file name
DEBU[0000] Locating /dev/dxg
DEBU[0000] Checking candidate '/dev/dxg'
DEBU[0000] Located /dev/dxg as [/dev/dxg]
INFO[0000] Selecting /dev/dxg as /dev/dxg
INFO[0000] Using WSL driver store paths: [/usr/lib/wsl/drivers/iigd_dch.inf_amd64_73655f941b1dd71f /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6]
WARN[0000] Found multiple driver store paths: [/usr/lib/wsl/drivers/iigd_dch.inf_amd64_73655f941b1dd71f /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6]
DEBU[0000] Using specified NVIDIA Container Toolkit CLI path /usr/bin/nvidia-ctk
DEBU[0000] Locating libcuda.so.1.1
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda.so.1.1'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located libcuda.so.1.1 as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda.so.1.1]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda.so.1.1 as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda.so.1.1
DEBU[0000] Locating libcuda_loader.so
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda_loader.so'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located libcuda_loader.so as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda_loader.so]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda_loader.so as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libcuda_loader.so
DEBU[0000] Locating libnvidia-ptxjitcompiler.so.1
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ptxjitcompiler.so.1'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located libnvidia-ptxjitcompiler.so.1 as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ptxjitcompiler.so.1]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ptxjitcompiler.so.1 as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ptxjitcompiler.so.1
DEBU[0000] Locating libnvidia-ml.so.1
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml.so.1'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located libnvidia-ml.so.1 as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml.so.1]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml.so.1 as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml.so.1
DEBU[0000] Locating libnvidia-ml_loader.so
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml_loader.so'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located libnvidia-ml_loader.so as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml_loader.so]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml_loader.so as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/libnvidia-ml_loader.so
DEBU[0000] Locating libdxcore.so
DEBU[0000] Checking candidate '/usr/lib/wsl/lib/libdxcore.so'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located libdxcore.so as [/usr/lib/wsl/lib/libdxcore.so]
INFO[0000] Selecting /usr/lib/wsl/lib/libdxcore.so as /usr/lib/wsl/lib/libdxcore.so
DEBU[0000] Locating nvcubins.bin
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvcubins.bin'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located nvcubins.bin as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvcubins.bin]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvcubins.bin as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvcubins.bin
DEBU[0000] Locating nvidia-smi
DEBU[0000] Checking candidate '/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvidia-smi'
DEBU[0000] Found 1 candidates; ignoring further candidates
DEBU[0000] Located nvidia-smi as [/usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvidia-smi]
INFO[0000] Selecting /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvidia-smi as /usr/lib/wsl/drivers/nvlti.inf_amd64_9a2c79b60d6607c6/nvidia-smi
DEBU[0000] returning cached mounts
DEBU[0000] returning cached mounts
INFO[0000] Generated CDI spec with version 0.3.0
➜ nvidia-ctk cdi list
No help topic for 'list'
➜ podman run --rm -ti --device=nvidia.com/gpu=all ubuntu nvidia-smi
Error: stat nvidia.com/gpu=all: no such file or directory