NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.57k stars 13.73k forks source link

CDI devices are not exposed to containers in Docker rootless with nvidia-container-toolkit #339999

Open ereslibre opened 3 weeks ago

ereslibre commented 3 weeks ago

Describe the bug

When Docker is run in rootless mode, CDI devices are not exposed to the container.

Steps To Reproduce

Steps to reproduce the behavior:

  1. Enable docker rootless and nvidia-container-toolkit:
    hardware.nvidia-container-toolkit.enable = true;
    virtualisation.docker.rootless.enable = true;
  2. Try to inject devices exposed by the nvidia-container-toolkit in a container:
    $ DOCKER_HOST=unix://$XDG_RUNTIME_DIR/docker.sock docker run --rm -it --device=nvidia.com/gpu=all ubuntu:latest nvidia-smi   
    docker: Error response from daemon: could not select device driver "cdi" with capabilities: [].

Additional context

This issue was split from https://github.com/NixOS/nixpkgs/issues/337873#issuecomment-2332332343.


Add a :+1: reaction to issues you find important.

ereslibre commented 3 weeks ago

cc/ @chmanie @benxiao

ereslibre commented 6 days ago

Identified the problem and submitted https://github.com/moby/moby/pull/48541 upstream.

I'll create a PR on NixOS to include this patch in the meantime and gather feedback. I can confirm I am able to use Nvidia GPU's with CDI on rootless mode:

❯ DOCKER_HOST=unix:///run/user/1000/docker.sock docker run --rm --device=nvidia.com/gpu=all -it ubuntu:latest nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 4090 (UUID: GPU-c475e08b-0cc5-f5aa-4326-99699429b449)
GPU 1: NVIDIA GeForce RTX 2080 SUPER (UUID: GPU-5cca1a6f-7cee-b649-40f0-2d3ecb0aa207