Closed AkihiroSuda closed 9 months ago
The Nvidia CUDA image has been updated at https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags.
The image nvidia/cuda:9.0-base
can be updated to nvidia/cuda:12.3.1-base-ubuntu20.04
like https://docs.docker.com/compose/gpu-support/#example-of-a-compose-file-for-running-a-service-with-access-to-1-gpu-device.
Additional Information: In my environment, Nvidia Driver is installed with https://github.com/NVIDIA/gpu-operator. The --runtime
must be configured , like https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/sample-workload.html. So this command can work successfully:
nerdctl run -it --rm --gpus all --runtime=/usr/local/nvidia/toolkit/nvidia-container-runtime docker.io/nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi
Tue Jan 16 10:16:12 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.129.03 Driver Version: 535.129.03 CUDA Version: 12.3 |
|-----------------------------------------+----------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+======================+======================|
| 0 Tesla P40 On | 00000000:03:00.0 Off | 0 |
| N/A 25C P8 9W / 250W | 0MiB / 23040MiB | 0% Default |
| | | N/A |
+-----------------------------------------+----------------------+----------------------+
+---------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=======================================================================================|
| No running processes found |
+---------------------------------------------------------------------------------------+
But the nerdctl run -it --rm --gpus all docker.io/nvidia/cuda:12.3.1-base-ubuntu20.04 nvidia-smi
will fail with error message:
FATA[0000] failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: exec: "nvidia-smi": executable file not found in $PATH: unknown
same issue. it works with docker. why closed? so what is the solution?
https://github.com/containerd/nerdctl/blob/v1.7.2/docs/gpu.md#options-for-nerdctl-run---gpus
nvidia/cuda:9.0-base
image no longer seems to exist:The plain ubuntu image still works though
cc @ktock