NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
1.88k stars 214 forks source link

Error trying to use NVIDIA runtime with rootless Docker #99

Open muety opened 10 months ago

muety commented 10 months ago

I got a rootless Docker setup on Ubuntu 20.04 and set no-cgroups = true in /etc/nvidia-container-runtime/config.toml. However, when trying to run

docker run --rm --gpus all nvidia/cuda:12.0.0-base-ubuntu20.04 nvidia-smi

I'm getting the following error:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: initialization error: privilege change failed: invalid argument: unknown.

Possibly related to #85.

Inspired by https://github.com/NVIDIA/nvidia-docker/issues/1565#issuecomment-966053820, I realized that the above issue only occurs for users logging in via LDAP (using SSSD), not for "real" local user. Those users have very high (> 65536) IDs. Might be a coincidence, though.

Any ideas how to work around this?

dnns92 commented 10 months ago

I have a similar issue, see: #106. Where you able to resolve it?