Open MorphSeur opened 9 months ago
Hi. How was Docker installed? Is this Docker desktop, or docker engine?
Thanks for your reply!
The installation is a Docker Engine.
nvidia-container-toolkit
since 1 month until this morning.nvidia-driver
from 12.1
to 12.2
.I would say that the primary issue is that you're not able to configure the nvidia
runtime for your docker installation. It could be that the config file is not being used and that arguments to the docker daemon are being used instead. Could you confirm whether this is the case?
Thanks a lot for your reply.
The nvidia-runtime
is set correctly following the documentation,
May I know which config file? Is it /etc/nvidia-container-runtime/config.toml
?
The documentation is valid if the Docker daemon is using the /etc/docker/daemon.json
config file. If the daemon is configured through another mechanism or uses a different config file, the instructions need to be adapted. How is your docker daemon configured?
Thanks for your reply.
The daemon is configured using: sudo nvidia-ctk runtime configure --runtime=docker
Here is the file:
{
"runtimes": {
"nvidia": {
"args": [],
"path": "nvidia-container-runtime"
}
}
}
@MorphSeur if this config was being applied, then the following error would not be triggered:
docker: Error response from daemon: unknown or invalid runtime name: nvidia.
This is triggered repeatedly for all your examples indicating that the runtime is not being configured correctly. To address this we would have to understand what is non-standard about your docker installation. Do the docker daemon logs (journalctl -xu docker.service
) show any messages related to the config or the runtimes when the daemon is (re)started?
Are you perhaps running a rootless docker so that the instructions from https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#rootless-mode may need to be followed?
Yes, the docker deamon logs contain errors related to runtime; they are similar as the error above.
Feb 27 11:11:45 ubuntu dockerd[949569]: time="2024-02-27T11:11:45.380223199+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 27 11:11:45 ubuntu dockerd[949569]: time="2024-02-27T11:11:45.380234811+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 27 11:11:45 ubuntu dockerd[949569]: time="2024-02-27T11:11:45.487476417+01:00" level=error msg="Handler for POST /v1.43/containers/87f622a9e91d4ff977b7684e279badd8345e63ed91ffa10923fa34080754d593/start returned error: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'\nnvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown"
Feb 27 11:11:50 ubuntu dockerd[949569]: time="2024-02-27T11:11:50.315445730+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 27 11:11:50 ubuntu dockerd[949569]: time="2024-02-27T11:11:50.315497738+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 27 11:11:50 ubuntu dockerd[949569]: time="2024-02-27T11:11:50.419085301+01:00" level=error msg="Handler for POST /v1.43/containers/0f60f75fd2c4b678fa25aa43b0323a4822d295acade31e2c29ee4670e66d58cb/start returned error: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'\nnvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown"
Feb 27 11:21:51 ubuntu dockerd[949569]: time="2024-02-27T11:21:51.244479007+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 27 11:21:51 ubuntu dockerd[949569]: time="2024-02-27T11:21:51.244519553+01:00" level=error msg="stream copy error: reading from a closed fifo"
Feb 27 11:21:51 ubuntu dockerd[949569]: time="2024-02-27T11:21:51.434034408+01:00" level=error msg="Handler for POST /v1.43/containers/0b3069f08e0eb41d9c0f5967fa113c07fa5d27d72792f3b4a4d05e57a8225851/start returned error: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'\nnvidia-container-cli: initialization error: nvml error: driver/library version mismatch: unknown"
Feb 27 12:01:05 ubuntu systemd[1]: Stopping Docker Application Container Engine...
Regarding the docker, it is running with root.
Hello!
After a careful follow of the installation guide of NVIDIA Container Toolkit, a docker image is unable to use
nvidia
runtime.grep nvidia /etc/apt/sources.list.d/*.list
.nvidia runtime
.nvidia runtime
Thanks a lot for your help in this issue!