NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
2.46k stars 264 forks source link

WSL2 with GPU-P, NVIDIA_VISIBLE_DEVICES value doesn't work #70

Open brokeDude2901 opened 2 years ago

brokeDude2901 commented 2 years ago

On WSL2 with GPU-P, setting NVIDIA_VISIBLE_DEVICES value doesn't work on system with multiple GPUs (2x RTX A5000)

Command: podman run -it --rm -e NVIDIA_VISIBLE_DEVICES=1 tensorflow/tensorflow:latest-gpu-jupyter nvidia-smi

Output: +-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.52 Driver Version: 511.79 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA RTX A5000 On | 00000000:03:00.0 On | Off | |100% 35C P8 26W / 207W | 1659MiB / 24564MiB | 12% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA RTX A5000 On | 00000000:04:00.0 Off | Off | |100% 33C P8 15W / 207W | 0MiB / 24564MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+

Expected output: Should show only 1 GPU in nvidia-smi

Not sure this is nvidia-container-runtime or microsoft-wsl2 problem.

elezar commented 2 years ago

@brokeDude2901 sorry for the delay in getting to you. The mechanism that WSL2 uses to include devices into the container is not the same as for native-Linux systems. There is only a single device node (/dev/dxg) that is included and the traditional NVIDIA_VISIBLE_DEVICES-based filtering does not work as expected.

We are looking to address this limitation at some point in the future but I don't have a timeline for you.

User-3090 commented 1 year ago

Any progress on this? Can containers requiring GPU now be used with Podman + WSL2?

elezar commented 1 year ago

Any progress on this? Can containers requiring GPU now be used with Podman + WSL2?

@User-3090 the v1.13.0-rc.3 of the NVIDIA Container Toolkit includes support for generating a CDI specification for NVIDIA devices under WSL2. We should be promoting this version to GA in the next day or two.

To use this:

  1. install the nvidia-container-toolkit-base package on your WSL2 distribution
  2. run sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml (this should auto-detect that you're on a WSL2 system).
  3. request the available device(s) under podman: podman run --device=nvidia.com/gpu=all ubuntu nvidia-smi -L

Note that only the nvidia.com/gpu=all device is currently available. Once this restriction is lifted the tooling to generate CDI specifications will be updated to include individual devices.

Tsubajashi commented 3 months ago

any news on this? sorry if that is considered necro-bumping, but its driving me insane on Windows. thats one reason why i cant run Windows properly on one of my higher end workstation rigs with multiple GPUs without having to configure a ton of things differently.

rhochmayr commented 3 days ago

Also wondering about the current state of this. Especially when it comes to filtering for specific devices as mentioned here:

Note that only the nvidia.com/gpu=all device is currently available. Once this restriction is lifted the tooling to generate CDI specifications will be updated to include individual devices.

@elezar Do you have any insights here maybe?