NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
2.21k stars 241 forks source link

Rootless Docker CDI Injection: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=all: unknown. #434

Open LukasIAO opened 5 months ago

LukasIAO commented 5 months ago

Hello everyone,

we have recently set up a rootless docker instance alongside our existing docker on one of our servers, but ran into issues mounting host GPUs into the rootless containers. A workaround was presented in issue #85 (toggling no-cgroups to switch between rootful and rootless) with a mention of a better solution in the form of Nvidia CDI coming as an experimental feature in Docker 25.

After updating to the newest Docker releases and setting up CDI, our regular Docker instance behaved as we expected based on the documentation, but the rootless instance still runs into issues.

Setup to reproduce:

Distributor ID: Ubuntu
Description:    Ubuntu 22.04.4 LTS
Release:        22.04
Codename:       jammy

NVIDIA Container Toolkit CLI version 1.14.6
commit: 5605d191332dcfeea802c4497360d60a65c7887e

rootless: containerd github.com/containerd/containerd v1.7.13 7c3aca7a610df76212171d200ca3811ff6096eb8
rootful: containerd containerd.io 1.6.28 ae07eda36dd25f8a1b98dfbf587313b99c0190bb
+---------------------------------------------------------------------------------------+`
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          On  | 00000000:01:00.0 Off |                    0 |
| N/A   40C    P0              61W / 275W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          On  | 00000000:47:00.0 Off |                    0 |
| N/A   39C    P0              55W / 275W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   2  NVIDIA A100-SXM4-40GB          On  | 00000000:81:00.0 Off |                    0 |
| N/A   39C    P0              57W / 275W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   3  NVIDIA DGX Display             On  | 00000000:C1:00.0 Off |                  N/A |
| 34%   41C    P8              N/A /  50W |      1MiB /  4096MiB |      0%      Default |
|                                         |                      |                  N/A |
+-----------------------------------------+----------------------+----------------------+
|   4  NVIDIA A100-SXM4-40GB          On  | 00000000:C2:00.0 Off |                    0 |
| N/A   39C    P0              58W / 275W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+
config.toml (click to expand) ``` #accept-nvidia-visible-devices-as-volume-mounts = false #accept-nvidia-visible-devices-envvar-when-unprivileged = true disable-require = false #swarm-resource = "DOCKER_RESOURCE_GPU" [nvidia-container-cli] #debug = "/var/log/nvidia-container-toolkit.log" environment = [] #ldcache = "/etc/ld.so.cache" ldconfig = "@/sbin/ldconfig.real" load-kmods = true #no-cgroups = true #no-cgroups = false #path = "/usr/bin/nvidia-container-cli" #root = "/run/nvidia/driver" #user = "root:video" [nvidia-container-runtime] #debug = "/var/log/nvidia-container-runtime.log" log-level = "info" mode = "auto" runtimes = ["docker-runc", "runc"] [nvidia-container-runtime.modes] [nvidia-container-runtime.modes.csv] mount-spec-path = "/etc/nvidia-container-runtime/host-files-for-container.d" ```
cdiVersion: 0.5.0
containerEdits:
  deviceNodes:
  - path: /dev/nvidia-modeset
  - path: /dev/nvidia-uvm
  - path: /dev/nvidia-uvm-tools
  - path: /dev/nvidiactl
  hooks:
  - args:
    - nvidia-ctk
    - hook
    - create-symlinks
    - --link
    - libglxserver_nvidia.so.535.161.07::/lib/x86_64-linux-gnu/nvidia/xorg/libglxserver_nvidia.so
    hookName: createContainer
    path: /usr/bin/nvidia-ctk
  - args:
    - nvidia-ctk
    - hook
    - update-ldcache
    - --folder
    - /lib/x86_64-linux-gnu
    hookName: createContainer
    path: /usr/bin/nvidia-ctk
  mounts:
  - containerPath: /lib/x86_64-linux-gnu/libEGL_nvidia.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libEGL_nvidia.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libGLESv1_CM_nvidia.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libGLESv2_nvidia.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libGLESv2_nvidia.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libGLX_nvidia.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libGLX_nvidia.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libcuda.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libcuda.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libcudadebugger.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libcudadebugger.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvcuvid.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvcuvid.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-allocator.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-allocator.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-cfg.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-cfg.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1.1.0
    hostPath: /lib/x86_64-linux-gnu/libnvidia-egl-gbm.so.1.1.0
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-eglcore.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-encode.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-encode.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-fbc.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-fbc.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-glcore.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-glcore.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-glsi.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-glsi.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-glvkspirv.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-ml.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-ml.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-ngx.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-ngx.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-nscq.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-nscq.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-nvvm.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-nvvm.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-opencl.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-opencl.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-opticalflow.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-opticalflow.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-pkcs11-openssl3.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-pkcs11.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-pkcs11.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-ptxjitcompiler.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-rtcore.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-rtcore.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-tls.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-tls.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvidia-vulkan-producer.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvidia-vulkan-producer.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/libnvoptix.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/libnvoptix.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /run/nvidia-persistenced/socket
    hostPath: /run/nvidia-persistenced/socket
    options:
    - ro
    - nosuid
    - nodev
    - bind
    - noexec
  - containerPath: /usr/bin/nvidia-cuda-mps-control
    hostPath: /usr/bin/nvidia-cuda-mps-control
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/bin/nvidia-cuda-mps-server
    hostPath: /usr/bin/nvidia-cuda-mps-server
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/bin/nvidia-debugdump
    hostPath: /usr/bin/nvidia-debugdump
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/bin/nvidia-persistenced
    hostPath: /usr/bin/nvidia-persistenced
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/bin/nvidia-smi
    hostPath: /usr/bin/nvidia-smi
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/share/nvidia/nvoptix.bin
    hostPath: /usr/share/nvidia/nvoptix.bin
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/firmware/nvidia/535.161.07/gsp_ga10x.bin
    hostPath: /lib/firmware/nvidia/535.161.07/gsp_ga10x.bin
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/firmware/nvidia/535.161.07/gsp_tu10x.bin
    hostPath: /lib/firmware/nvidia/535.161.07/gsp_tu10x.bin
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/nvidia/xorg/libglxserver_nvidia.so.535.161.07
    hostPath: /lib/x86_64-linux-gnu/nvidia/xorg/libglxserver_nvidia.so.535.161.07
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so
    hostPath: /lib/x86_64-linux-gnu/nvidia/xorg/nvidia_drv.so
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/share/X11/xorg.conf.d/10-nvidia.conf
    hostPath: /usr/share/X11/xorg.conf.d/10-nvidia.conf
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
    hostPath: /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/share/glvnd/egl_vendor.d/10_nvidia.json
    hostPath: /usr/share/glvnd/egl_vendor.d/10_nvidia.json
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/share/vulkan/icd.d/nvidia_icd.json
    hostPath: /usr/share/vulkan/icd.d/nvidia_icd.json
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/share/vulkan/implicit_layer.d/nvidia_layers.json
    hostPath: /usr/share/vulkan/implicit_layer.d/nvidia_layers.json
    options:
    - ro
    - nosuid
    - nodev
    - bind
devices:
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia4
    - path: /dev/dri/card5
    - path: /dev/dri/renderD132
    hooks:
    - args:
      - nvidia-ctk
      - hook
      - create-symlinks
      - --link
      - ../card5::/dev/dri/by-path/pci-0000:01:00.0-card
      - --link
      - ../renderD132::/dev/dri/by-path/pci-0000:01:00.0-render
      hookName: createContainer
      path: /usr/bin/nvidia-ctk
    - args:
      - nvidia-ctk
      - hook
      - chmod
      - --mode
      - "755"
      - --path
      - /dev/dri
      hookName: createContainer
      path: /usr/bin/nvidia-ctk
  name: "0"
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia3
    - path: /dev/dri/card4
    - path: /dev/dri/renderD131
    hooks:
    - args:
      - nvidia-ctk
      - hook
      - create-symlinks
      - --link
      - ../card4::/dev/dri/by-path/pci-0000:47:00.0-card
      - --link
      - ../renderD131::/dev/dri/by-path/pci-0000:47:00.0-render
      hookName: createContainer
      path: /usr/bin/nvidia-ctk
    - args:
      - nvidia-ctk
      - hook
      - chmod
      - --mode
      - "755"
      - --path
      - /dev/dri
      hookName: createContainer
      path: /usr/bin/nvidia-ctk
  name: "1"
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia2
    - path: /dev/dri/card3
    - path: /dev/dri/renderD130
    hooks:
    - args:
      - nvidia-ctk
      - hook
      - create-symlinks
      - --link
      - ../card3::/dev/dri/by-path/pci-0000:81:00.0-card
      - --link
      - ../renderD130::/dev/dri/by-path/pci-0000:81:00.0-render
      hookName: createContainer
      path: /usr/bin/nvidia-ctk
    - args:
      - nvidia-ctk
      - hook
      - chmod
      - --mode
      - "755"
      - --path
      - /dev/dri
      hookName: createContainer
      path: /usr/bin/nvidia-ctk
  name: "2"
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia1
    - path: /dev/dri/card2
    - path: /dev/dri/renderD129
    hooks:
    - args:
      - nvidia-ctk
      - hook
      - create-symlinks
      - --link
      - ../card2::/dev/dri/by-path/pci-0000:c2:00.0-card
      - --link
      - ../renderD129::/dev/dri/by-path/pci-0000:c2:00.0-render
      hookName: createContainer
      path: /usr/bin/nvidia-ctk
    - args:
      - nvidia-ctk
      - hook
      - chmod
      - --mode
      - "755"
      - --path
      - /dev/dri
      hookName: createContainer
      path: /usr/bin/nvidia-ctk
  name: "4"
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia1
    - path: /dev/nvidia2
    - path: /dev/nvidia3
    - path: /dev/nvidia4
    - path: /dev/dri/card2
    - path: /dev/dri/card3
    - path: /dev/dri/card4
    - path: /dev/dri/card5
    - path: /dev/dri/renderD129
    - path: /dev/dri/renderD130
    - path: /dev/dri/renderD131
    - path: /dev/dri/renderD132
    hooks:
    - args:
      - nvidia-ctk
      - hook
      - create-symlinks
      - --link
      - ../card5::/dev/dri/by-path/pci-0000:01:00.0-card
      - --link
      - ../renderD132::/dev/dri/by-path/pci-0000:01:00.0-render
      hookName: createContainer
      path: /usr/bin/nvidia-ctk
    - args:
      - nvidia-ctk
      - hook
      - chmod
      - --mode
      - "755"
      - --path
      - /dev/dri
      hookName: createContainer
      path: /usr/bin/nvidia-ctk
    - args:
      - nvidia-ctk
      - hook
      - create-symlinks
      - --link
      - ../card4::/dev/dri/by-path/pci-0000:47:00.0-card
      - --link
      - ../renderD131::/dev/dri/by-path/pci-0000:47:00.0-render
      hookName: createContainer
      path: /usr/bin/nvidia-ctk
    - args:
      - nvidia-ctk
      - hook
      - create-symlinks
      - --link
      - ../card3::/dev/dri/by-path/pci-0000:81:00.0-card
      - --link
      - ../renderD130::/dev/dri/by-path/pci-0000:81:00.0-render
      hookName: createContainer
      path: /usr/bin/nvidia-ctk
    - args:
      - nvidia-ctk
      - hook
      - create-symlinks
      - --link
      - ../card2::/dev/dri/by-path/pci-0000:c2:00.0-card
      - --link
      - ../renderD129::/dev/dri/by-path/pci-0000:c2:00.0-render
      hookName: createContainer
      path: /usr/bin/nvidia-ctk
  name: all
kind: nvidia.com/gpu

INFO[0000] Found 5 CDI devices
nvidia.com/gpu=0
nvidia.com/gpu=1
nvidia.com/gpu=2
nvidia.com/gpu=4
nvidia.com/gpu=all

The issue: When no-cgroups = false CDI injection works fine for the regular Docker instance:

$ docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all ubuntu nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-b6022b4d-71db-8f15-15de-26a719f6b3e1)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-22420f7d-6edb-e44a-c322-4ce539cade19)
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-5e3444e2-8577-0e99-c6ee-72f6eb2bd28c)
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-dd1f811d-a280-7e2e-bf7e-b84f7a977cc1)

but produces the following errors for the rootless version:

$ docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all ubuntu nvidia-smi -L
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=all: unknown.

Running docker run --rm --gpus all ubuntu nvidia-smi results in the same error as without OCI. This seems to be consistent across all variations listed on the Specialized Configurations for Docker page:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: runc create failed: unable to start container process: error during container init: error running hook #0: error running hook: exit status 1, stdout: , stderr: Auto-detected mode as 'legacy'
nvidia-container-cli: mount error: failed to add device rules: unable to find any existing device filters attached to the cgroup: bpf_prog_query(BPF_CGROUP_DEVICE) failed: operation not permitted: unknown.

Interestingly, setting no-cgroups = true disables the regular use of GPUs with rootful Docker:

$ docker run --rm --gpus all ubuntu nvidia-smi
Failed to initialize NVML: Unknown Error

but still allows for CDI injections:

$ docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all ubuntu nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-b6022b4d-71db-8f15-15de-26a719f6b3e1)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-22420f7d-6edb-e44a-c322-4ce539cade19)
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-5e3444e2-8577-0e99-c6ee-72f6eb2bd28c)
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-dd1f811d-a280-7e2e-bf7e-b84f7a977cc1)

With control groups disabled, the rootless daemon is able to use exposed GPUs as outlined in the Docker docs:

$ docker run -it --rm --gpus '"device=0,2"' ubuntu nvidia-smi
Mon Apr  1 16:33:52 2024
+---------------------------------------------------------------------------------------+
| NVIDIA-SMI 535.161.07             Driver Version: 535.161.07   CUDA Version: 12.2     |
|-----------------------------------------+----------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |         Memory-Usage | GPU-Util  Compute M. |
|                                         |                      |               MIG M. |
|=========================================+======================+======================|
|   0  NVIDIA A100-SXM4-40GB          Off | 00000000:01:00.0 Off |                    0 |
| N/A   37C    P0              60W / 275W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+
|   1  NVIDIA A100-SXM4-40GB          Off | 00000000:81:00.0 Off |                    0 |
| N/A   36C    P0              56W / 275W |      0MiB / 40960MiB |      0%      Default |
|                                         |                      |             Disabled |
+-----------------------------------------+----------------------+----------------------+

+---------------------------------------------------------------------------------------+
| Processes:                                                                            |
|  GPU   GI   CI        PID   Type   Process name                            GPU Memory |
|        ID   ID                                                             Usage      |
|=======================================================================================|
|  No running processes found                                                           |
+---------------------------------------------------------------------------------------+

TLDR Disabling c-groups allows the rootless containers to use exposed GPUs using the regular docker run --gpus flag. This in turn disables the rootful container's GPU access. Leaving control groups enabled reverses the effect, as outlined in #85 .

Disabling c-groups and using Nvidia CDI, the rootful Docker can still use GPU injection, even though regular GPU access is barred, while the rootless container uses the exposed GPUs. CDI injection for rootless fails in both cases, however.

This seems like a definite improvement, but I'm not sure it's intended behavior. The CDI injection failing with rootless regardless of control group setting leads me to believe this is unintended, unless rootless is not yet supported by Nvidia CDI.

Any insights would be greatly appreciated!

elezar commented 5 months ago

The error:

docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=all: unknown.

Indidates that rootless docker cannot find the CDI specifications that were generated. As far as I am aware, rootless docker modifies the path used for /etc (and other paths) and this is what could be causing issues here for the runtime.

Since you're using a docker version that supports CDI (as an opt-in feature, I believe). Could you try the native CDI injection here.

Running:

nvidia-ctk runtime configure --runtime=docker --cdi.enabled

and restarting the docker daemon should enable this feature. (Note that the command may need to be adjusted for rootless mode to specify the config file path explicitly as per https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/latest/install-guide.html#rootless-mode).

Then with the CDI feature enabled in docker you should be able to run:

$ docker run --rm -ti --device=nvidia.com/gpu=all ubuntu nvidia-smi -L

and have the devices injected without using the nvidia runtime.

LukasIAO commented 5 months ago

Hey @elezar, thank you for taking the time!

CDI injection seems to be a mainline feature in Docker 26.0.0. though it is till experimental, it no longer requires the user to set DOCKER_CLI_EXPERIMENTAL, as was the case in 25.x.

The native injection worked on rootful after configuring the daemon as suggested, though the rootless Docker still runs into issues as listed below.

Before applying the suggested configurations I tested the following on rootless:

$ docker run --rm -ti --device=nvidia.com/gpu=all ubuntu nvidia-smi -L
docker: Error response from daemon: could not select device driver "cdi" with capabilities: [].
ERRO[0000] error waiting for container: context canceled

$ docker run --rm -ti --runtime=nvidia --device=nvidia.com/gpu=all ubuntu nvidia-smi -L
docker: Error response from daemon: could not select device driver "cdi" with capabilities: [].

$ docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all ubuntu nvidia-smi -L
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=all: unknown.

After applying the configuration with nvidia-ctk runtime configure --runtime=docker --cdi.enabled --config=$HOME/.config/docker/daemon.json the daemon.json looks like this:

{
    "features": {
        "cdi": true
    },
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}

Restarting Docker and testing the CDI injections again lead to the following regardless of c-group setting:

$ docker run --rm -ti --device=nvidia.com/gpu=all ubuntu nvidia-smi -L
docker: Error response from daemon: CDI device injection failed: unresolvable CDI devices nvidia.com/gpu=all.

$ docker run --rm -ti --runtime=nvidia --device=nvidia.com/gpu=all ubuntu nvidia-smi -L
docker: Error response from daemon: CDI device injection failed: unresolvable CDI devices nvidia.com/gpu=all.

$ docker run --rm -ti --runtime=nvidia -e NVIDIA_VISIBLE_DEVICES=nvidia.com/gpu=all ubuntu nvidia-smi -L
docker: Error response from daemon: failed to create task for container: failed to create shim task: OCI runtime create failed: could not apply required modification to OCI specification: error modifying OCI spec: failed to inject CDI devices: unresolvable CDI devices nvidia.com/gpu=all: unknown.

I checked the location for the configurations for both docker clients:

rootless (click to expand) ``` Client: Version: 26.0.0 Context: rootless Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.13.1 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.5.0 Path: /usr/libexec/docker/cli-plugins/docker-compose Server: Containers: 4 Running: 0 Paused: 0 Stopped: 4 Images: 3 Server Version: 26.0.0 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: false userxattr: true Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog CDI spec directories: /etc/cdi /var/run/cdi Swarm: inactive Runtimes: nvidia runc io.containerd.runc.v2 Default Runtime: runc Init Binary: docker-init containerd version: 7c3aca7a610df76212171d200ca3811ff6096eb8 runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: seccomp Profile: builtin rootless cgroupns Kernel Version: 5.15.0-1047-nvidia Operating System: Ubuntu 22.04.4 LTS OSType: linux Architecture: x86_64 CPUs: 128 Total Memory: 503.5GiB Name: DGX-Station-A100-920-23487-2530-0R0 ID: 48ae789a-3d2d-43d8-841a-9a34c9bdc46e Docker Root Dir: /home/ver23371/.local/share/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false Product License: Community Engine WARNING: No cpu cfs quota support WARNING: No cpu cfs period support WARNING: No cpu shares support WARNING: No cpuset support WARNING: No io.weight support WARNING: No io.weight (per device) support WARNING: No io.max (rbps) support WARNING: No io.max (wbps) support WARNING: No io.max (riops) support WARNING: No io.max (wiops) support ```
rootful (click to expand) ``` Client: Docker Engine - Community Version: 26.0.0 Context: default Debug Mode: false Plugins: buildx: Docker Buildx (Docker Inc.) Version: v0.13.1 Path: /usr/libexec/docker/cli-plugins/docker-buildx compose: Docker Compose (Docker Inc.) Version: v2.5.0 Path: /usr/libexec/docker/cli-plugins/docker-compose Server: Containers: 8 Running: 0 Paused: 0 Stopped: 8 Images: 52 Server Version: 26.0.0 Storage Driver: overlay2 Backing Filesystem: extfs Supports d_type: true Using metacopy: false Native Overlay Diff: true userxattr: false Logging Driver: json-file Cgroup Driver: systemd Cgroup Version: 2 Plugins: Volume: local Network: bridge host ipvlan macvlan null overlay Log: awslogs fluentd gcplogs gelf journald json-file local splunk syslog CDI spec directories: /etc/cdi /var/run/cdi Swarm: inactive Runtimes: io.containerd.runc.v2 nvidia runc Default Runtime: runc Init Binary: docker-init containerd version: ae07eda36dd25f8a1b98dfbf587313b99c0190bb runc version: v1.1.12-0-g51d5e94 init version: de40ad0 Security Options: apparmor seccomp Profile: builtin cgroupns Kernel Version: 5.15.0-1047-nvidia Operating System: Ubuntu 22.04.4 LTS OSType: linux Architecture: x86_64 CPUs: 128 Total Memory: 503.5GiB Name: DGX-Station-A100-920-23487-2530-0R0 ID: a59ada2d-f489-4072-9c54-4d7a3efa0906 Docker Root Dir: /var/lib/docker Debug Mode: false Experimental: false Insecure Registries: 127.0.0.0/8 Live Restore Enabled: false ```

Both point to:

 CDI spec directories:
  /etc/cdi
  /var/run/cdi

However, it looks like nothing was created under /var/run/cdi. Permissions for nvidia.yaml:

/etc/cdi$ ls -la
total 32
drwxr-xr-x   2 root root  4096 ožu  29 23:22 .
drwxr-xr-x 167 root root 12288 ožu  29 23:22 ..
-rw-r--r--   1 root root 13203 ožu  29 23:22 nvidia.yaml

The Docker docs for enabling CDI devices suggest manually setting the spec location, but it does not seem to make a difference in this case.

{
    "features": {
        "cdi": true
    },
    "cdi-spec-dirs": ["/etc/cdi/", "/var/run/cdi"],
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}
elezar commented 5 months ago

Could you try generate (or copy) a CDI spec to /var/run/cdi in addition to /etc/cdi and see if this fixes the rootless case.

LukasIAO commented 5 months ago

I copied the yaml to /var/run/cdi, restarted both Dockers, and tested again. Unfortunatly, there was no change in behavior.

/var/run/cdi$ ls -la
total 16
drwxr-xr-x  2 root root    60 tra   3 10:02 .
drwxr-xr-x 51 root root  1580 tra   3 10:02 ..
-rw-r--r--  1 root root 13203 tra   3 10:02 nvidia.yaml
elezar commented 5 months ago

I think the key is the following: https://github.com/moby/moby/blob/8599f2a3fb884afcbbf1471ec793fbcbc327cd35/cmd/dockerd/docker.go#L65C1-L72C1

I would assume that for the docker daemon running with the rootless kit, the path where it is trying to resolve the CDI device specifications is not /var/run/cdi or /etc/cdi. It may be good to create an issue (or transfer this one) to https://github.com/moby/moby so that we can get input from the developers there as to where these paths map to.

It may be sufficient to copy the spec file to a location that is readable by the daemon to confirm.

Note that plugins are also handled differently for rootless mode: https://github.com/moby/moby/blob/8599f2a3fb884afcbbf1471ec793fbcbc327cd35/pkg/plugins/discovery_unix.go#L11

klueska commented 5 months ago

I wonder if this implies that the "correct" location for rootless is $HOME/.docker/cdi or $HOME/.docker/run/cdi?

LukasIAO commented 5 months ago

I just tested @klueska idea, by copying the yaml to $HOME/.docker/cdi and $HOME/.docker/run/cdi respectively, and specifying the custom location in the daemon.

{
    "features": {
        "cdi": true
    },
    "cdi-spec-dirs": ["/home/username/.docker/cdi/", "/home/username/.docker/run/cdi/"],
    "runtimes": {
        "nvidia": {
            "args": [],
            "path": "nvidia-container-runtime"
        }
    }
}
CDI spec directories:
  /home/username/.docker/cdi/
  /home/username/.docker/run/cdi/

With this change, the native CDI injection does indeed run on rootless.

/.config/docker$ docker run --rm -ti --device=nvidia.com/gpu=all ubuntu nvidia-smi -L
GPU 0: NVIDIA A100-SXM4-40GB (UUID: GPU-b6022b4d-71db-8f15-15de-26a719f6b3e1)
GPU 1: NVIDIA A100-SXM4-40GB (UUID: GPU-22420f7d-6edb-e44a-c322-4ce539cade19)
GPU 2: NVIDIA A100-SXM4-40GB (UUID: GPU-5e3444e2-8577-0e99-c6ee-72f6eb2bd28c)
GPU 3: NVIDIA A100-SXM4-40GB (UUID: GPU-dd1f811d-a280-7e2e-bf7e-b84f7a977cc1)
klueska commented 5 months ago

It's good to know there is a path to making this work. I'd be interested to know if these are the "default" locations if you remove cdi-spec-dirs entirely.

elezar commented 5 months ago

It's good to know there is a path to making this work. I'd be interested to know if these are the "default" locations if you remove cdi-spec-dirs entirely.

I would be surprised if this is the case since iirc we explicitly set /etc/cdi and /var/run/cdi in the Daemon.

LukasIAO commented 5 months ago

You can see the Docker info of the rootles client in my original reply to @elezar. Before specifying it explicitly, I wanted to check where the client was looking for the config. Once CDI is enabled, both rootless and rootful seems to default to:

CDI spec directories:
  /etc/cdi
  /var/run/cdi

The choice of ./docker/cdi seemed fitting, however.

klueska commented 5 months ago

That seems like a bug that should be filed against moby/docker then.

LukasIAO commented 5 months ago

It might also be worth including in the documentation for the CDI, that a rootless Docker client requires the yaml to be generated/moved to a location the daemon has access to, wherever that may end up being.

Milor123 commented 2 weeks ago

I was a similar bug and keep reading and try things here, because dont found more info.

I'm in manjaro and this bug was very werid, because yersterday my docker is working well with my GPU, however after a update, something break it, and when i try use GPU in docker ollama it shows this failed to stat CDI host device "/dev/nvidia-modeset"

Error: setting up CDI devices: failed to inject devices: failed to stat CDI host device "/dev/nvidia-modeset": no such file or directory

My solution looks like stupid or strange, or maybe could think that not work, but i've reinstalled nvidia-container-toolkit using pacman, and to my surprise it worked, I never thought that something so silly would work. My silly solution for me strange case:

sudo pacman -S nvidia-container-toolkit

PD: I use podman

kirovtome commented 1 week ago

Anyone figured this one out on GCP COS VM?