containerd / nerdctl

contaiNERD CTL - Docker-compatible CLI for containerd, with support for Compose, Rootless, eStargz, OCIcrypt, IPFS, ...
Apache License 2.0
8.13k stars 603 forks source link

docs/gpu.md: `docker.io/nvidia/cuda:9.0-base: not found` #2755

Closed AkihiroSuda closed 9 months ago

AkihiroSuda commented 9 months ago

https://github.com/containerd/nerdctl/blob/v1.7.2/docs/gpu.md#options-for-nerdctl-run---gpus

nvidia/cuda:9.0-base image no longer seems to exist:

$ nerdctl run -it --rm --gpus all nvidia/cuda:9.0-base nvidia-smi
docker.io/nvidia/cuda:9.0-base: resolving      |--------------------------------------| 
elapsed: 1.1 s                  total:   0.0 B (0.0 B/s)                                         
INFO[0001] trying next host - response was http.StatusNotFound  host=registry-1.docker.io
FATA[0001] failed to resolve reference "docker.io/nvidia/cuda:9.0-base": docker.io/nvidia/cuda:9.0-base: not found

The plain ubuntu image still works though

$ nerdctl run -it --rm --gpus all ubuntu nvidia-smi
Tue Jan 16 07:27:01 2024       
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.146.02             Driver Version: 535.146.02   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla T4                       Off | 00000000:00:1E.0 Off |                    0 |
| N/A   24C    P8               8W /  70W |      2MiB / 15360MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

cc @ktock

yankay commented 9 months ago

The Nvidia CUDA image has been updated at https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags. The image nvidia/cuda:9.0-base can be updated to nvidia/cuda:12.3.1-base-ubuntu20.04 like https://docs.docker.com/compose/gpu-support/#example-of-a-compose-file-for-running-a-service-with-access-to-1-gpu-device.


Additional Information: In my environment, Nvidia Driver is installed with https://github.com/NVIDIA/gpu-operator. The --runtime must be configured , like https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html. So this command can work successfully:

nerdctl run -it --rm --gpus all --runtime=/usr/local/nvidia/toolkit/nvidia-container-runtime docker.io/nvidia/cuda:12.3.1-base-ubuntu20.04  nvidia-smi

Tue Jan 16 10:16:12 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03             Driver Version: 535.129.03   CUDA Version: 12.3     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  Tesla P40                      On  | 00000000:03:00.0 Off |                    0 |
| N/A   25C    P8               9W / 250W |      0MiB / 23040MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

But the nerdctl run -it --rm --gpus all docker.io/nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi will fail with error message:

FATA[0000] failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown
jxfruit commented 4 months ago

same issue. it works with docker. why closed? so what is the solution?