docker / for-win

Bug reports for Docker Desktop for Windows
https://www.docker.com/products/docker#/windows
1.86k stars 288 forks source link

invoking cuda in container, got the error: out of memory. #12733

Closed before31 closed 1 year ago

before31 commented 2 years ago

Actual behavior

When invoking cuda in container , I got the out of memory error. Even if I invoke the cudaGetDeviceCount method, I got the same error. Through the nvidia-smi in container, I can see the gpu memory is enough.

nvidia-smi in Windows:

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 512.77 Driver Version: 512.77 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name TCC/WDDM | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... WDDM | 00000000:02:00.0 Off | N/A | | 23% 34C P8 12W / 250W | 660MiB / 11264MiB | 4% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... WDDM | 00000000:03:00.0 Off | N/A | | 23% 29C P8 11W / 250W | 0MiB / 11264MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 NVIDIA GeForce ... WDDM | 00000000:82:00.0 Off | N/A | | 23% 29C P8 10W / 250W | 0MiB / 11264MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 NVIDIA GeForce ... WDDM | 00000000:83:00.0 Off | N/A | | 23% 27C P8 11W / 250W | 12MiB / 11264MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

nvidia-smi in container:

+-----------------------------------------------------------------------------+ | NVIDIA-SMI 510.68.02 Driver Version: 512.77 CUDA Version: 11.6 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 NVIDIA GeForce ... On | 00000000:02:00.0 Off | N/A | | 23% 34C P8 11W / 250W | 651MiB / 11264MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 NVIDIA GeForce ... On | 00000000:03:00.0 Off | N/A | | 23% 29C P8 11W / 250W | 0MiB / 11264MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 NVIDIA GeForce ... On | 00000000:82:00.0 Off | N/A | | 23% 28C P8 10W / 250W | 0MiB / 11264MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 NVIDIA GeForce ... On | 00000000:83:00.0 Off | N/A | | 23% 27C P8 10W / 250W | 12MiB / 11264MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+

nvcc -V in container:

nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2019 NVIDIA Corporation Built on Wed_Oct_23_19:24:38_PDT_2019 Cuda compilation tools, release 10.2, V10.2.89

Expected behavior

I can invoke cuda in container normally.

Information

Output of & "C:\Program Files\Docker\Docker\resources\com.docker.diagnose.exe" check

[PASS] DD0027: is there available disk space on the host? [PASS] DD0028: is there available VM disk space? [PASS] DD0031: does the Docker API work? [PASS] DD0004: is the Docker engine running? [PASS] DD0011: are the LinuxKit services running? [PASS] DD0016: is the LinuxKit VM running? [PASS] DD0001: is the application running? [SKIP] DD0018: does the host support virtualization? [PASS] DD0002: does the bootloader have virtualization enabled? [PASS] DD0017: can a VM be started? [PASS] DD0024: is WSL installed? [PASS] DD0021: is the WSL 2 Windows Feature enabled? [PASS] DD0022: is the Virtual Machine Platform Windows Feature enabled? [PASS] DD0025: are WSL distros installed? [PASS] DD0026: is the WSL LxssManager service running? [PASS] DD0029: is the WSL 2 Linux filesystem corrupt? [PASS] DD0015: are the binary symlinks installed? [PASS] DD0003: is the Docker CLI working? [PASS] DD0013: is the $PATH ok? [PASS] DD0005: is the user in the docker-users group? [PASS] DD0007: is the backend responding? [PASS] DD0014: are the backend processes running? [PASS] DD0008: is the native API responding? [PASS] DD0009: is the vpnkit API responding? [PASS] DD0010: is the Docker API proxy responding? [PASS] DD0006: is the Docker Desktop Service responding? [PASS] DD0012: is the VM networking working? [PASS] DD0032: do Docker networks overlap with host IPs? [SKIP] DD0030: is the image access management authorized? [PASS] DD0033: does the host have Internet access? No fatal errors detected.

docker run --rm -it --gpus=all nvcr.io/nvidia/k8s/cuda-sample:nbody nbody -gpu -benchmark

got:

Run "nbody -benchmark [-numbodies=]" to measure performance. -fullscreen (run n-body simulation in fullscreen mode) -fp64 (use double precision floating point values for simulation) -hostmem (stores simulation data in host memory) -benchmark (run benchmark to measure performance) -numbodies= (number of bodies (>= 1) to run in simulation) -device= (where d=0,1,2.... for the CUDA device to use) -numdevices= (where i=(number of CUDA devices > 0) to use for simulation) -compare (compares simulation results running once on the default GPU and once on the CPU) -cpu (run n-body simulation on the CPU) -tipsy= (load a tipsy model file for simulation) NOTE: The CUDA Samples are not meant for performance measurements. Results may vary when GPU Boost is enabled. Error: only 0 Devices available, 1 requested. Exiting.

Steps to reproduce the behavior

  1. ...
  2. ...
Yonv1943 commented 2 years ago

I met the same problem with you.

When invoking cuda in container , I got the out of memory error. Even if I invoke the cudaGetDeviceCount method, I got the same error. Through the nvidia-smi in container, I can see the gpu memory is enough.

ding92 commented 2 years ago

Same to you.

I can invoke cuda in Windows, but failure in WSL2 Ubuntu system.

before31 commented 2 years ago

Maybe this is a bug in wsl2, not in docker. Plz follow this issue in WSL repo. @Yonv1943 @ding92

docker-robott commented 1 year ago

There hasn't been any activity on this issue for a long time. If the problem is still relevant, add a comment on this issue. If not, this issue will be closed in 30 days.

Mark the issue as fresh with a /remove-lifecycle stale comment. Stale issues will be closed after an additional %v days of inactivity.

Prevent issues from auto-closing with a /lifecycle frozen comment.

/lifecycle stale

before31 commented 1 year ago

/remove-lifecycle stale

docker-robott commented 1 year ago

There hasn't been any activity on this issue for a long time. If the problem is still relevant, mark the issue as fresh with a /remove-lifecycle stale comment. If not, this issue will be closed in 30 days.

Prevent issues from auto-closing with a /lifecycle frozen comment.

/lifecycle stale

before31 commented 1 year ago

/remove-lifecycle stale

docker-robott commented 1 year ago

There hasn't been any activity on this issue for a long time. If the problem is still relevant, mark the issue as fresh with a /remove-lifecycle stale comment. If not, this issue will be closed in 30 days.

Prevent issues from auto-closing with a /lifecycle frozen comment.

/lifecycle stale

nicolasayotte commented 1 month ago

/remove-lifecycle stale