NixOS / nixpkgs

Nix Packages collection & NixOS
MIT License
17.37k stars 13.6k forks source link

nvidia container toolkit does not work on Docker but works in Podman #337873

Closed s1n7ax closed 1 week ago

s1n7ax commented 2 weeks ago

Describe the bug

When hardware.nvidia-container-toolkit.enable = true;, it should be possible to run the container with gpu capability.

According to this official documentation, sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi should print the nvidia-smi output. However, that results in following error.

❯ sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

docker: Error response from daemon: unknown or invalid runtime name: nvidia.
See 'docker run --help'.

❯ sudo docker run --rm --gpus all ubuntu nvidia-smi

docker: Error response from daemon: could not select device driver "" with capabilities: [[gpu]].

However, this works perfectly with podman. I used pytorch to validate GPU is detected correctly and it does.

❯ podman run --rm -it --security-opt=label=disable \
   --device=nvidia.com/gpu=all \
   pytorch:2.4.0-cuda12.4-cudnn9-runtime nvidia-smi
Wed Aug 28 05:43:59 2024
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 560.35.03              Driver Version: 560.35.03      CUDA Version: 12.6     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce GTX 1060 6GB    Off |   00000000:05:00.0  On |                  N/A |
|  0%   40C    P8              8W /  200W |     632MiB /   6144MiB |      6%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
+-----------------------------------------------------------------------------------------+

❯ podman run --rm -it --security-opt=label=disable \
   --device=nvidia.com/gpu=all \
   pytorch:2.4.0-cuda12.4-cudnn9-runtime bash
root@a41e66aa5e6e:/workspace# python
Python 3.11.9 (main, Apr 19 2024, 16:48:06) [GCC 11.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.cuda.device(0)
<torch.cuda.device object at 0x7f1840d377d0>
>>> torch.cuda.get_device_name(0)
'NVIDIA GeForce GTX 1060 6GB'
>>>

Steps To Reproduce

Steps to reproduce the behavior:

  1. Add following nvidia configuration https://github.com/s1n7ax/nixos/blob/c86eb7143559de2f749ffe565fe821d74ab62cf4/system/hardware/nvidia.nix?plain=1#L1-L23
  2. Add following docker configuration https://github.com/s1n7ax/nixos/blob/c86eb7143559de2f749ffe565fe821d74ab62cf4/system/utils/docker.nix?plain=1#L29-L31
  3. Run following command sudo docker run --rm --runtime=nvidia --gpus all ubuntu nvidia-smi

Expected behavior

nvidia-smi should be executed successfully within the container and you should be the correct output.

Metadata

Please run nix-shell -p nix-info --run "nix-info -m" and paste the result.

❯ nix-shell -p nix-info --run "nix-info -m"
 - system: `"x86_64-linux"`
 - host os: `Linux 6.6.47, NixOS, 24.11 (Vicuna), 24.11.20240824.d0e1602`
 - multi-user?: `yes`
 - sandbox: `yes`
 - version: `nix-env (Nix) 2.18.5`
 - channels(s1n7ax): `"home-manager, nixos-23.11"`
 - channels(root): `"nixos-23.11"`
 - nixpkgs: `/nix/store/ia1zpg1s63v6b3vin3n7bxxjgcs51s2r-source`

Add a :+1: reaction to issues you find important.

ereslibre commented 2 weeks ago

Hello!

With hardware.nvidia-container-toolkit.enable = true;, the Container Device Interface will be used, so that you can use it both with podman, like you mentioned:

$ podman run --rm -it --device=nvidia.com/gpu=all ubuntu:latest nvidia-smi

As well as with Docker:

$ docker run --rm -it --device=nvidia.com/gpu=all ubuntu:latest nvidia-smi

Note: it's --device with Docker as well when you use CDI instead of the nvidia runtime wrappers that did use --gpus.

Also, please note that you'll need to use Docker 25 at least, that is the version that implements CDI:

virtualisation.docker.package = pkgs.docker_25
chmanie commented 1 week ago

Hi, sorry for hijacking this, but it might be related.

For me the command line outlined above does work with podman but not with docker. There I am getting

docker: Error response from daemon: could not select device driver "cdi" with capabilities: [].

My config is as follows:

  hardware = {
    nvidia = {
      package = config.boot.kernelPackages.nvidiaPackages.stable;

      modesetting.enable = true;

      powerManagement.enable = true;
      powerManagement.finegrained = true;

      open = false;

      nvidiaSettings = true;

      prime = {
        offload = {
          enable = true;
          enableOffloadCmd = true;
        };
        amdgpuBusId = "PCI:10:00:0";
        nvidiaBusId = "PCI:1:00:0";
      };
    };
    nvidia-container-toolkit = {
      enable = true;
    };
  };
benxiao commented 1 week ago

I am having the exact same problem. wasn't able to get nvidia docker working with just hardware.nvidia-container-toolkit.enable = true; on unstable. and it was working before.

ereslibre commented 1 week ago

@chmanie, @benxiao: as the comment https://github.com/NixOS/nixpkgs/issues/337873#issuecomment-2320357105 states, did you also do:

Also, please note that you'll need to use Docker 25 at least, that is the version that implements CDI:

virtualisation.docker.package = pkgs.docker_25

chmanie commented 1 week ago

@chmanie, @benxiao: as the comment https://github.com/NixOS/nixpkgs/issues/337873#issuecomment-2320357105 states, did you also do:

Also, please note that you'll need to use Docker 25 at least, that is the version that implements CDI:

virtualisation.docker.package = pkgs.docker_25

I tried 26 and 27 as it said "at least" but happy to try 25 as well.

ereslibre commented 1 week ago

I tried 26 and 27 as it said "at least" but happy to try 25 as well.

Oh, from your comment and NixOS configuration snippet I didn’t infer you had changed the default docker version at all. Please, confirm if that’s the case. Otherwise it might be something else.

chmanie commented 1 week ago

I tried now docker_25, docker_26 and docker_27 (default in unstable), getting the same result:

docker: Error response from daemon: could not select device driver "cdi" with capabilities: [].

For completeness, here's my docker config:

  virtualisation.docker = {
    enable = true;
    package = pkgs.docker_25;
    rootless = {
      enable = true;
      setSocketVariable = true;
    };
    autoPrune = {
      enable = true;
    };
    storageDriver = "btrfs";
  };

My user is in the docker group.

Could it have to with running in rootless mode? EDIT: tried w/o rootless, yielding the same result.

ereslibre commented 1 week ago

@chmanie can you paste the command you are running?

chmanie commented 1 week ago

@chmanie can you paste the command you are running?

Of course, sorry:

$ docker run --rm -it --device=nvidia.com/gpu=all ubuntu:latest nvidia-smi   
docker: Error response from daemon: could not select device driver "cdi" with capabilities: [].
ereslibre commented 1 week ago

@chmanie Thank you. You mentioned it works with podman. Is /var/run/cdi/nvidia-container-toolkit.json populated, and contains a valid JSON? You can try to get this file regenerated by running sudo systemctl restart nvidia-container-toolkit-cdi-generator.service. Similarly, you can get the logs of the file generation by running sudo journalctl -u nvidia-container-toolkit-cdi-generator.service.

Just to rule out something, what does docker version report?

Do you also have services.xserver.videoDrivers = ["nvidia"]; on your configuration? -- it doesn't matter if the machine is headless. -- If you didn't have this, make sure to restart the machine after changing the configuration and applying it.

Some settings require to restart the machine. Testing these changes is a bit cumbersome.

chmanie commented 1 week ago

Hey, thank you for the detailed debugging instructions. Here's what I found:

I forgot to mention that I have two GPUs so my services.xserver.videoDrivers looks like this

  services.xserver = {
    videoDrivers = [
      "amdgpu"
      "nvidia"
    ];
  };

I'm not running in headless mode but am using my AMD GPU mainly and the Nvidia GPU in offload mode.

Here's the output of cat /var/run/cdi/nvidia-container-toolkit.json:

{
  "cdiVersion": "0.5.0",
  "kind": "nvidia.com/gpu",
  "devices": [
    {
      "name": "0",
      "containerEdits": {
        "deviceNodes": [
          {
            "path": "/dev/nvidia0"
          },
          {
            "path": "/dev/dri/card1"
          },
          {
            "path": "/dev/dri/renderD129"
          }
        ],
        "hooks": [
          {
            "hookName": "createContainer",
            "path": "/nix/store/nqp4im42a376ryaryxrzqy535dxryrbq-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-ctk",
            "args": [
              "nvidia-ctk",
              "hook",
              "create-symlinks",
              "--link",
              "../card1::/dev/dri/by-path/pci-0000:01:00.0-card",
              "--link",
              "../renderD129::/dev/dri/by-path/pci-0000:01:00.0-render"
            ]
          },
          {
            "hookName": "createContainer",
            "path": "/nix/store/nqp4im42a376ryaryxrzqy535dxryrbq-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-ctk",
            "args": [
              "nvidia-ctk",
              "hook",
              "chmod",
              "--mode",
              "755",
              "--path",
              "/dev/dri"
            ]
          }
        ]
      }
    },
    {
      "name": "all",
      "containerEdits": {
        "deviceNodes": [
          {
            "path": "/dev/nvidia0"
          },
          {
            "path": "/dev/dri/card1"
          },
          {
            "path": "/dev/dri/renderD129"
          }
        ],
        "hooks": [
          {
            "hookName": "createContainer",
            "path": "/nix/store/nqp4im42a376ryaryxrzqy535dxryrbq-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-ctk",
            "args": [
              "nvidia-ctk",
              "hook",
              "create-symlinks",
              "--link",
              "../card1::/dev/dri/by-path/pci-0000:01:00.0-card",
              "--link",
              "../renderD129::/dev/dri/by-path/pci-0000:01:00.0-render"
            ]
          },
          {
            "hookName": "createContainer",
            "path": "/nix/store/nqp4im42a376ryaryxrzqy535dxryrbq-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-ctk",
            "args": [
              "nvidia-ctk",
              "hook",
              "chmod",
              "--mode",
              "755",
              "--path",
              "/dev/dri"
            ]
          }
        ]
      }
    }
  ],
  "containerEdits": {
    "deviceNodes": [
      {
        "path": "/dev/nvidia-modeset"
      },
      {
        "path": "/dev/nvidia-uvm"
      },
      {
        "path": "/dev/nvidia-uvm-tools"
      },
      {
        "path": "/dev/nvidiactl"
      }
    ],
    "hooks": [
      {
        "hookName": "createContainer",
        "path": "/nix/store/nqp4im42a376ryaryxrzqy535dxryrbq-container-toolkit-container-toolkit-1.15.0-rc.3/bin/nvidia-ctk",
        "args": [
          "nvidia-ctk",
          "hook",
          "update-ldcache",
          "--ldconfig-path",
          "/nix/store/mg27y4zq8j0m8dn83azqmq02xvfmsd9i-glibc-2.39-52-bin/bin/ldconfig",
          "--folder",
          "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib"
        ]
      }
    ],
    "mounts": [
      {
        "hostPath": "/etc/egl/egl_external_platform.d/10_nvidia_wayland.json",
        "containerPath": "/etc/egl/egl_external_platform.d/10_nvidia_wayland.json",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/etc/egl/egl_external_platform.d/15_nvidia_gbm.json",
        "containerPath": "/etc/egl/egl_external_platform.d/15_nvidia_gbm.json",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libEGL_nvidia.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libEGL_nvidia.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libGLESv1_CM_nvidia.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libGLESv1_CM_nvidia.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libGLESv2_nvidia.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libGLESv2_nvidia.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libGLX_nvidia.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libGLX_nvidia.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libcuda.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libcuda.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libcudadebugger.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libcudadebugger.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libglxserver_nvidia.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libglxserver_nvidia.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvcuvid.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvcuvid.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-allocator.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-allocator.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-cfg.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-cfg.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-egl-gbm.so.1.1.1",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-egl-gbm.so.1.1.1",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-eglcore.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-eglcore.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-encode.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-encode.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-fbc.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-fbc.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-glcore.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-glcore.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-glsi.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-glsi.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-glvkspirv.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-glvkspirv.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-gpucomp.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-gpucomp.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-ml.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-ml.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-ngx.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-ngx.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-nvvm.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-nvvm.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-opencl.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-opencl.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-opticalflow.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-opticalflow.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-pkcs11-openssl3.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-pkcs11-openssl3.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-ptxjitcompiler.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-ptxjitcompiler.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-rtcore.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-rtcore.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-tls.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-tls.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-vksc-core.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-vksc-core.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvoptix.so.560.35.03",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvoptix.so.560.35.03",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/klbgvbg0j34sf5axl2p2waswc5bvg636-firmware/lib/firmware/nvidia/560.35.03/gsp_ga10x.bin",
        "containerPath": "/nix/store/klbgvbg0j34sf5axl2p2waswc5bvg636-firmware/lib/firmware/nvidia/560.35.03/gsp_ga10x.bin",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/klbgvbg0j34sf5axl2p2waswc5bvg636-firmware/lib/firmware/nvidia/560.35.03/gsp_tu10x.bin",
        "containerPath": "/nix/store/klbgvbg0j34sf5axl2p2waswc5bvg636-firmware/lib/firmware/nvidia/560.35.03/gsp_tu10x.bin",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/run/opengl-driver",
        "containerPath": "/run/opengl-driver",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/etc",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/etc",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/share",
        "containerPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/share",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/5adwdl39g3k9a2j0qadvirnliv4r7pwd-glibc-2.39-52/lib",
        "containerPath": "/nix/store/5adwdl39g3k9a2j0qadvirnliv4r7pwd-glibc-2.39-52/lib",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/5adwdl39g3k9a2j0qadvirnliv4r7pwd-glibc-2.39-52/lib64",
        "containerPath": "/nix/store/5adwdl39g3k9a2j0qadvirnliv4r7pwd-glibc-2.39-52/lib64",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/zph8xlxmypy0g0pajglkvramfcbjlscq-nvidia-x11-560.35.03-6.10.5-bin/bin/nvidia-cuda-mps-control",
        "containerPath": "/usr/bin/nvidia-cuda-mps-control",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/zph8xlxmypy0g0pajglkvramfcbjlscq-nvidia-x11-560.35.03-6.10.5-bin/bin/nvidia-cuda-mps-server",
        "containerPath": "/usr/bin/nvidia-cuda-mps-server",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/zph8xlxmypy0g0pajglkvramfcbjlscq-nvidia-x11-560.35.03-6.10.5-bin/bin/nvidia-debugdump",
        "containerPath": "/usr/bin/nvidia-debugdump",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/zph8xlxmypy0g0pajglkvramfcbjlscq-nvidia-x11-560.35.03-6.10.5-bin/bin/nvidia-powerd",
        "containerPath": "/usr/bin/nvidia-powerd",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/zph8xlxmypy0g0pajglkvramfcbjlscq-nvidia-x11-560.35.03-6.10.5-bin/bin/nvidia-smi",
        "containerPath": "/usr/bin/nvidia-smi",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib",
        "containerPath": "/usr/local/nvidia/lib",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      },
      {
        "hostPath": "/nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib",
        "containerPath": "/usr/local/nvidia/lib64",
        "options": [
          "ro",
          "nosuid",
          "nodev",
          "bind"
        ]
      }
    ]
  }
}

And here the output of sudo journalctl -u nvidia-container-toolkit-cdi-generator.service (after regenerating it):

Sep 05 18:10:26 guanabana systemd[1]: Starting Container Device Interface (CDI) for Nvidia generator...
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Auto-detected mode as \"nvml\""
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /dev/nvidia0 as /dev/nvidia0"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /dev/dri/card1 as /dev/dri/card1"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate /dev/dri/controlD65: pattern /dev/dri/controlD65 not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /dev/dri/renderD129 as /dev/dri/renderD129"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Using driver version 560.35.03"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /dev/nvidia-modeset as /dev/nvidia-modeset"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /dev/nvidia-uvm-tools as /dev/nvidia-uvm-tools"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /dev/nvidia-uvm as /dev/nvidia-uvm"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /dev/nvidiactl as /dev/nvidiactl"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-egl-gbm.so.1.1.1 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate glvnd/egl_vendor.d/10_nvidia.json: pattern glvnd/egl_vendor.d/10_nvidia.json not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate vulkan/icd.d/nvidia_icd.json: pattern vulkan/icd.d/nvidia_icd.json not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate vulkan/icd.d/nvidia_layers.json: pattern vulkan/icd.d/nvidia_layers.json not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate vulkan/implicit_layer.d/nvidia_layers.json: pattern vulkan/implicit_layer.d/nvidia_layers.json not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /etc/egl/egl_external_platform.d/15_nvidia_gbm.json as /etc/egl/egl_external_platform.d/15_nvidia_gbm.json"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /etc/egl/egl_external_platform.d/10_nvidia_wayland.json as /etc/egl/egl_external_platform.d/10_nvidia_wayland.json"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate nvidia/nvoptix.bin: pattern nvidia/nvoptix.bin not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate nvidia/xorg/nvidia_drv.so: pattern nvidia/xorg/nvidia_drv.so not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate nvidia/xorg/libglxserver_nvidia.so.560.35.03: pattern nvidia/xorg/libglxserver_nvidia.so.560.35.03 not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate X11/xorg.conf.d/10-nvidia.conf: pattern X11/xorg.conf.d/10-nvidia.conf not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libEGL_nvidia.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libGLESv1_CM_nvidia.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw->
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libGLESv2_nvidia.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvi>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libGLX_nvidia.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libcuda.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-5>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libcudadebugger.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvid>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libglxserver_nvidia.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw->
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvcuvid.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x1>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-allocator.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw->
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-cfg.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-eglcore.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nv>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-encode.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvi>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-fbc.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-glcore.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvi>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-glsi.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidi>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-glvkspirv.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw->
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-gpucomp.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nv>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-ml.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia->
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-ngx.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-nvvm.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidi>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-opencl.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvi>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-opticalflow.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvbl>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-pkcs11-openssl3.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-ptxjitcompiler.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446v>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-rtcore.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvi>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-tls.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvidia-vksc-core.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw->
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x11-560.35.03-6.10.5/lib/libnvoptix.so.560.35.03 as /nix/store/xdqvrwrf5baqz49aaa0wzd3z446vvblw-nvidia-x1>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate /nvidia-persistenced/socket: pattern /nvidia-persistenced/socket not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate /nvidia-fabricmanager/socket: pattern /nvidia-fabricmanager/socket not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate /tmp/nvidia-mps: pattern /tmp/nvidia-mps not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/klbgvbg0j34sf5axl2p2waswc5bvg636-firmware/lib/firmware/nvidia/560.35.03/gsp_ga10x.bin as /nix/store/klbgvbg0j34sf5axl2p2waswc5bvg636-firmware/lib>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Selecting /nix/store/klbgvbg0j34sf5axl2p2waswc5bvg636-firmware/lib/firmware/nvidia/560.35.03/gsp_tu10x.bin as /nix/store/klbgvbg0j34sf5axl2p2waswc5bvg636-firmware/lib>
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate nvidia-smi: pattern nvidia-smi not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate nvidia-debugdump: pattern nvidia-debugdump not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate nvidia-persistenced: pattern nvidia-persistenced not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate nvidia-cuda-mps-control: pattern nvidia-cuda-mps-control not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate nvidia-cuda-mps-server: pattern nvidia-cuda-mps-server not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate nvidia/xorg/nvidia_drv.so: pattern nvidia/xorg/nvidia_drv.so not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=warning msg="Could not locate nvidia/xorg/libglxserver_nvidia.so.560.35.03: pattern nvidia/xorg/libglxserver_nvidia.so.560.35.03 not found"
Sep 05 18:10:26 guanabana nvidia-cdi-generator[26173]: time="2024-09-05T18:10:26+02:00" level=info msg="Generated CDI spec with version 0.5.0"
Sep 05 18:10:26 guanabana systemd[1]: Finished Container Device Interface (CDI) for Nvidia generator.

and docker --version:

Docker version 27.1.1, build v27.1.1

I hope this is somehow useful.

ereslibre commented 1 week ago

@chmanie I'm a bit lost here, everything looks good. It's actually very similar to the one that gets generated in my system. We also have the same Docker version. Can you check that your Docker configuration has the following in the configuration:

{
  ...
  "features": {
    "cdi": true
  },
  ...
}

This setting should have been added automatically by NixOS when you set hardware.nvidia-container-toolkit.enable = true;, given you have Docker >= 25. You can check so by running cat $(ps aux | grep dockerd | gawk 'match($0, /config-file=(.*)/, a) {print a[1]}').

A last check would be running docker system info.

ereslibre commented 1 week ago

I can reproduce your problem with rootless docker. There are different problems arising here.

  1. We are not setting virtualisation.docker.rootless.daemon.settings.features.cdi = true; automatically when you do hardware.nvidia-container-toolkit.enable = true;, as we are doing with virtualisation.docker.daemon.settings.features.cdi = true, but applicable when virtualisation.docker.rootless.enable is true.

This fixes the first part of the problem, in NixOS. I would be willing to open a PR to fix this problem, but this is when the second problem arises:

  1. https://github.com/NVIDIA/nvidia-container-toolkit/issues/434

Rootless docker does not inspect /etc/cdi nor /var/run/cdi, and the daemon ignores the CDI spec dirs if set with virtualisation.docker.rootless.daemon.settings.cdi-spec-dirs = ["/var/run/cdi/"];. This requires more investigation.

On that issue, they also confirmed that writing the CDI spec to the user $HOME works: https://github.com/NVIDIA/nvidia-container-toolkit/issues/434#issuecomment-2034107220; if the generated CDI specs are under the user $HOME, and configured with cdi-spec-dirs, they will be loaded correctly.

I don't think we can port this logic to NixOS in a way that makes sense right now. However, rootless docker with Nvidia GPU and CDI seems very close. I'm going to keep an eye open for when it's possible to improve the situation on NixOS. Until that happens, I would ask you to use the non-rootless version or podman.

Thanks for raising awareness on this @chmanie and @benxiao, and thanks for the help looking into the issue :)

chmanie commented 1 week ago

Thank you so much for looking into this! I'll stick with the rootful version for now and monitor the issues closely!

Thanks again!