Description

I have a system that is running Windows 11 (fully updated), WSL2 Ubuntu 22.04.2, Cuda 12.2 installed via NVIDIA instructions. I have 4 RTX3090s that are exposed in WSL. I have installed Docker Desktop 4.22 and set WSL as the engine. When running 2 different docker contaners that has GPU support I get either a "out of memory error" or "Error: only 0 Devices available, 1 requested. Exiting."

Reproduce

docker run --gpus 1 --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -it --rm nvcr.io/nvidia/pytorch:23.07-py3

============= == PyTorch ==

NVIDIA Release 23.07 (build 63867923) PyTorch Version 2.1.0a0+b5021ba

Copyright (c) 2014-2023 Facebook Inc. Copyright (c) 2011-2014 Idiap Research Institute (Ronan Collobert) Copyright (c) 2012-2014 Deepmind Technologies (Koray Kavukcuoglu) Copyright (c) 2011-2012 NEC Laboratories America (Koray Kavukcuoglu) Copyright (c) 2011-2013 NYU (Clement Farabet) Copyright (c) 2006-2010 NEC Laboratories America (Ronan Collobert, Leon Bottou, Iain Melvin, Jason Weston) Copyright (c) 2006 Idiap Research Institute (Samy Bengio) Copyright (c) 2001-2004 Idiap Research Institute (Ronan Collobert, Samy Bengio, Johnny Mariethoz) Copyright (c) 2015 Google Inc. Copyright (c) 2015 Yangqing Jia Copyright (c) 2013-2016 The Caffe contributors All rights reserved.

This container image and its contents are governed by the NVIDIA Deep Learning Container License. By pulling and using the container, you accept the terms and conditions of this license: https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license

ERROR: The NVIDIA Driver is present, but CUDA failed to initialize. GPU functionality will not be available. [[ Out of memory (error 2) ]]

root@95ab4a1ff094:/workspace#


2. docker run --gpus device=GPU-acb454dd-c800-18af-ce94-27e4f8ece22a nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -benchmark -gpu

Run "nbody -benchmark [-numbodies=]" to measure performance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies= (number of bodies (>= 1) to run in simulation) -device= (where d=0,1,2.... for the CUDA device to use) -numdevices= (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy= (load a tipsy model file for simulation)

NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled.

Error: only 0 Devices available, 1 requested. Exiting.

theskaz@KevenPC:~/cuda-samples/Samples/1_Utilities/deviceQuery$ nvidia-smi --query-gpu=uuid --format=csv uuid GPU-52dc20f4-4d06-1338-85c9-c493c5dcec02 GPU-873ec493-2856-0e4e-b7c4-47f859451051 GPU-48b827a9-e51b-75c5-1149-389a3d45d09c GPU-acb454dd-c800-18af-ce94-27e4f8ece22a

and

theskaz@KevenPC:~/cuda-samples/Samples/1_Utilities/deviceQuery$ CUDA_VISIBLE_DEVICES=0,2,3 ./deviceQuery ./deviceQuery Starting...

CUDA Device Query (Runtime API) version (CUDART static linking)

Detected 3 CUDA Capable device(s)

Device 0: "NVIDIA GeForce RTX 3090" CUDA Driver Version / Runtime Version 12.2 / 12.2 CUDA Capability Major/Minor version number: 8.6 Total amount of global memory: 24576 MBytes (25769279488 bytes) (082) Multiprocessors, (128) CUDA Cores/MP: 10496 CUDA Cores GPU Max Clock rate: 1800 MHz (1.80 GHz) Memory Clock rate: 9751 Mhz Memory Bus Width: 384-bit L2 Cache Size: 6291456 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total shared memory per multiprocessor: 102400 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Managed Memory: Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 1 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 1: "NVIDIA GeForce RTX 3090" CUDA Driver Version / Runtime Version 12.2 / 12.2 CUDA Capability Major/Minor version number: 8.6 Total amount of global memory: 24576 MBytes (25769279488 bytes) (082) Multiprocessors, (128) CUDA Cores/MP: 10496 CUDA Cores GPU Max Clock rate: 1800 MHz (1.80 GHz) Memory Clock rate: 9751 Mhz Memory Bus Width: 384-bit L2 Cache Size: 6291456 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total shared memory per multiprocessor: 102400 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Managed Memory: Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 65 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Device 2: "NVIDIA GeForce RTX 3090" CUDA Driver Version / Runtime Version 12.2 / 12.2 CUDA Capability Major/Minor version number: 8.6 Total amount of global memory: 24576 MBytes (25769279488 bytes) (082) Multiprocessors, (128) CUDA Cores/MP: 10496 CUDA Cores GPU Max Clock rate: 1800 MHz (1.80 GHz) Memory Clock rate: 9751 Mhz Memory Bus Width: 384-bit L2 Cache Size: 6291456 bytes Maximum Texture Dimension Size (x,y,z) 1D=(131072), 2D=(131072, 65536), 3D=(16384, 16384, 16384) Maximum Layered 1D Texture Size, (num) layers 1D=(32768), 2048 layers Maximum Layered 2D Texture Size, (num) layers 2D=(32768, 32768), 2048 layers Total amount of constant memory: 65536 bytes Total amount of shared memory per block: 49152 bytes Total shared memory per multiprocessor: 102400 bytes Total number of registers available per block: 65536 Warp size: 32 Maximum number of threads per multiprocessor: 1536 Maximum number of threads per block: 1024 Max dimension size of a thread block (x,y,z): (1024, 1024, 64) Max dimension size of a grid size (x,y,z): (2147483647, 65535, 65535) Maximum memory pitch: 2147483647 bytes Texture alignment: 512 bytes Concurrent copy and kernel execution: Yes with 2 copy engine(s) Run time limit on kernels: Yes Integrated GPU sharing Host Memory: No Support host page-locked memory mapping: Yes Alignment requirement for Surfaces: Yes Device has ECC support: Disabled Device supports Unified Addressing (UVA): Yes Device supports Managed Memory: Yes Device supports Compute Preemption: Yes Supports Cooperative Kernel Launch: Yes Supports MultiDevice Co-op Kernel Launch: No Device PCI Domain ID / Bus ID / location ID: 0 / 97 / 0 Compute Mode: < Default (multiple host threads can use ::cudaSetDevice() with device simultaneously) >

Peer access from NVIDIA GeForce RTX 3090 (GPU0) -> NVIDIA GeForce RTX 3090 (GPU1) : No Peer access from NVIDIA GeForce RTX 3090 (GPU0) -> NVIDIA GeForce RTX 3090 (GPU2) : No Peer access from NVIDIA GeForce RTX 3090 (GPU1) -> NVIDIA GeForce RTX 3090 (GPU0) : No Peer access from NVIDIA GeForce RTX 3090 (GPU1) -> NVIDIA GeForce RTX 3090 (GPU2) : No Peer access from NVIDIA GeForce RTX 3090 (GPU2) -> NVIDIA GeForce RTX 3090 (GPU0) : No Peer access from NVIDIA GeForce RTX 3090 (GPU2) -> NVIDIA GeForce RTX 3090 (GPU1) : No

deviceQuery, CUDA Driver = CUDART, CUDA Driver Version = 12.2, CUDA Runtime Version = 12.2, NumDevs = 3 Result = PASS

### Expected behavior Expected behavior is that the GPUs are available for usage within the 2 respective containers. ### docker version ```bash Client: Cloud integration: v1.0.35-desktop+001 Version: 24.0.5 API version: 1.43 Go version: go1.20.6 Git commit: ced0996 Built: Fri Jul 21 20:36:24 2023 OS/Arch: windows/amd64 Context: default Server: Docker Desktop 4.22.1 (118664) Engine: Version: 24.0.5 API version: 1.43 (minimum version 1.12) Go version: go1.20.6 Git commit: a61e2b4 Built: Fri Jul 21 20:35:45 2023 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.21 GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8 runc: Version: 1.1.7 GitCommit: v1.1.7-0-g860f061 docker-init: Version: 0.19.0 GitCommit: de40ad0

docker info

Client: Version: 24.0.5 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.11.2-desktop.1 Path: C:\Program Files\Docker\cli-plugins\docker-buildx.exe compose: Docker Compose (Docker Inc.) Version: v2.20.2-desktop.1 Path: C:\Program Files\Docker\cli-plugins\docker-compose.exe dev: Docker Dev Environments (Docker Inc.) Version: v0.1.0 Path: C:\Program Files\Docker\cli-plugins\docker-dev.exe extension: Manages Docker extensions (Docker Inc.) Version: v0.2.20 Path: C:\Program Files\Docker\cli-plugins\docker-extension.exe init: Creates Docker-related starter files for your project (Docker Inc.) Version: v0.1.0-beta.6 Path: C:\Program Files\Docker\cli-plugins\docker-init.exe sbom: View the packaged-based Software Bill Of Materials (SBOM) for an image (Anchore Inc.) Version: 0.6.0 Path: C:\Program Files\Docker\cli-plugins\docker-sbom.exe scan: Docker Scan (Docker Inc.) Version: v0.26.0 Path: C:\Program Files\Docker\cli-plugins\docker-scan.exe scout: Command line tool for Docker Scout (Docker Inc.) Version: 0.20.0 Path: C:\Program Files\Docker\cli-plugins\docker-scout.exe Server: Containers: 19 Running: 1 Paused: 0 Stopped: 18 Images: 2 Server Version: 24.0.5 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: cgroupfs Cgroup Version: 1 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog Swarm: inactive Runtimes: io.containerd.runc.v2 runc Default Runtime: runc Init Binary: docker-init containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8 runc version: v1.1.7-0-g860f061 init version: de40ad0 Security Options: seccomp Profile: unconfined Kernel Version: 5.15.90.1-microsoft-standard-WSL2 Operating System: Docker Desktop OSType: linux Architecture: x86_64 CPUs: 64 Total Memory: 377.7GiB Name: docker-desktop ID: e76ea70c-fad4-425c-8df1-29786ac0e18e Docker Root Dir: /var/lib/docker Debug Mode: false HTTP Proxy: http.docker.internal:3128 HTTPS Proxy: http.docker.internal:3128 No Proxy: hubproxy.docker.internal Experimental: false Insecure Registries: hubproxy.docker.internal:5555 127.0.0.0/8 Live Restore Enabled: false WARNING: No blkio throttle.read_bps_device support WARNING: No blkio throttle.write_bps_device support WARNING: No blkio throttle.read_iops_device support WARNING: No blkio throttle.write_iops_device support WARNING: daemon is not using the default seccomp profile

Diagnostics ID

CC53A74D-0891-4068-868B-0FD1A097A965/20230825153431

Additional Info

theskaz@KevenPC:~/cuda-samples/Samples/1_Utilities/deviceQuery$ nvidia-smi Fri Aug 25 09:37:05 2023 +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.103 Driver Version: 537.13 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+======================+======================| | 0 NVIDIA GeForce RTX 3090 On | 00000000:01:00.0 Off | N/A | | 0% 27C P8 11W / 335W | 53MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce RTX 3090 On | 00000000:2E:00.0 Off | N/A | | 0% 27C P8 14W / 335W | 0MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 2 NVIDIA GeForce RTX 3090 On | 00000000:41:00.0 On | N/A | | 0% 29C P3 69W / 335W | 6162MiB / 24576MiB | 60% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ | 3 NVIDIA GeForce RTX 3090 On | 00000000:61:00.0 Off | N/A | | 0% 28C P8 12W / 335W | 0MiB / 24576MiB | 0% Default | | | | N/A | +-----------------------------------------+----------------------+----------------------+ +---------------------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=======================================================================================| | 0 N/A N/A 23 G /Xwayland N/A | | 1 N/A N/A 23 G /Xwayland N/A | | 2 N/A N/A 23 G /Xwayland N/A | | 3 N/A N/A 23 G /Xwayland N/A |

docker / for-win

WSL2 GPU Support issue on Docker Desktop 4.22 #13658

Description

Reproduce

============= == PyTorch ==

docker info

Diagnostics ID

Additional Info