NVIDIA / nvidia-container-toolkit

Build and run containers leveraging NVIDIA GPUs
Apache License 2.0
2.46k stars 264 forks source link

Failed to initialize NVML: Insufficient Permissions #210

Closed forestofrain closed 9 months ago

forestofrain commented 9 months ago

I only had to change one setting in config.toml for my system.

user = "root:video"

I compared the relevant files to a fresh Ubuntu 23 install, where rootless worked. The only difference was the permissions on /dev/nvidia*. My distro, gentoo, installs a config that changes the defaults for device file parameters. This was introduced with a commit on 2021-07-21. The NVIDIA driver FAQ provides an example in the FAQ:

How and when are the NVIDIA device files created?

Whether a user-space NVIDIA driver component does so itself, or invokes nvidia-modprobe, it will default to creating the device files with the following attributes:

      UID:  0     - 'root'
      GID:  0     - 'root'
      Mode: 0666  - 'rw-rw-rw-'
For example, the NVIDIA driver can be instructed to create device files with UID=0 (root), GID=44 (video) and Mode=0660 by passing the following module parameters to the NVIDIA Linux kernel module:

      NVreg_DeviceFileUID=0
      NVreg_DeviceFileGID=44
      NVreg_DeviceFileMode=0660

This looks reasonable to me.

Is this a bug with the container toolkit or is it expected? I would assume that with ModifyDeviceFiles = 1 that I should not have to change my distro config.

Possible relevant information below.

Rootless Error

podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L
Failed to initialize NVML: Insufficient Permissions
Octal Permissions    Size User Group Date Modified Name
0660  crw-rw----  195,254 root video 21 Jan 13:39  /dev/nvidia-modeset
0666  crw-rw-rw-    509,0 root root  21 Jan 13:39  /dev/nvidia-uvm
0666  crw-rw-rw-    509,1 root root  21 Jan 13:39  /dev/nvidia-uvm-tools
0660  crw-rw----    195,0 root video 21 Jan 13:39  /dev/nvidia0
0660  crw-rw----  195,255 root video 21 Jan 13:39  /dev/nvidiactl
cat /proc/driver/nvidia/params
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 27
DeviceFileMode: 432

Rootless Success

After adding a config override with file mode 0666, podman rootless works as expected.

options nvidia NVreg_DeviceFileMode=0666
podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L                                                                 13:51:17
GPU 0: NVIDIA GeForce RTX 3080 (UUID: GPU-...)
Octal Permissions    Size User Group Date Modified Name
0666  crw-rw-rw-  195,254 root video 20 Jan 16:27  /dev/nvidia-modeset
0666  crw-rw-rw-    509,0 root root  20 Jan 16:27  /dev/nvidia-uvm
0666  crw-rw-rw-    509,1 root root  20 Jan 16:27  /dev/nvidia-uvm-tools
0666  crw-rw-rw-    195,0 root video 20 Jan 16:27  /dev/nvidia0
0666  crw-rw-rw-  195,255 root video 20 Jan 16:27  /dev/nvidiactl
cat /proc/driver/nvidia/params
ModifyDeviceFiles: 1
DeviceFileUID: 0
DeviceFileGID: 27
DeviceFileMode: 438
elezar commented 9 months ago

@forestofrain this is something that we're aware of, but haven't really found the correct solution for. On most systems it's limited to DRM devices nodes that have root:video ownership. As you point out, however, this is dependent on the driver parameters.

One thing we're considering is https://github.com/cncf-tags/container-device-interface/issues/175 where the spec will include the additional GIDs that are required in the container to have access to the device nodes when these are created with 0660 permissions and not 0666.

In our testing, it was also not quite clear if setting the GID in the CDI specification would have the same effect.

Would you be able to confirm this on your end. That is to say:

  1. Generate a CDI specification using nvidia-ctk cdi generate
  2. Add the relevant gid field to the device nodes that require it (/dev/nvidia-modeset, /dev/nvidia0, /dev/nvidiactl). For example:
    deviceNodes:
    - path: /dev/nvidia-modeset
    gid: 27
    - path: /dev/nvidiactl
    gid: 27

Then repeat your experiments.

If this works as expected, then the spec extension is not required and we can work on updating our generated spec to include the required GID information.

If this still fails then we would have to confirm that running podman with --group-add=27 -- which updates the AdditionalGIDs field in the OCI runtime spec -- works as desired.

forestofrain commented 9 months ago

Same permission error. Any other information you need?

podman run --rm --device nvidia.com/gpu=all ubuntu nvidia-smi -L
Failed to initialize NVML: Insufficient Permissions
podman run --rm --group-add=27 --device nvidia.com/gpu=all ubuntu nvidia-smi -L
Failed to initialize NVML: Insufficient Permissions

My /etc/cdi/nvidia.yaml with your suggested changes. Note my 3080 is only for compute and my graphics an Arc 770.

---
cdiVersion: 0.5.0
containerEdits:
  deviceNodes:
  - path: /dev/nvidia-modeset
    gid: 27
  - path: /dev/nvidia-uvm
  - path: /dev/nvidia-uvm-tools
  - path: /dev/nvidiactl
    gid: 27
  hooks:
  - args:
    - nvidia-ctk
    - hook
    - update-ldcache
    - --folder
    - /usr/lib64
    hookName: createContainer
    path: /usr/sbin/nvidia-ctk
  mounts:
  - containerPath: /opt/bin/nvidia-cuda-mps-control
    hostPath: /opt/bin/nvidia-cuda-mps-control
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /opt/bin/nvidia-cuda-mps-server
    hostPath: /opt/bin/nvidia-cuda-mps-server
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /opt/bin/nvidia-debugdump
    hostPath: /opt/bin/nvidia-debugdump
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /run/nvidia-persistenced/socket
    hostPath: /run/nvidia-persistenced/socket
    options:
    - ro
    - nosuid
    - nodev
    - bind
    - noexec
  - containerPath: /usr/lib64/libEGL_nvidia.so.545.29.06
    hostPath: /usr/lib64/libEGL_nvidia.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libGLESv1_CM_nvidia.so.545.29.06
    hostPath: /usr/lib64/libGLESv1_CM_nvidia.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libGLESv2_nvidia.so.545.29.06
    hostPath: /usr/lib64/libGLESv2_nvidia.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libGLX_nvidia.so.545.29.06
    hostPath: /usr/lib64/libGLX_nvidia.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libcuda.so.545.29.06
    hostPath: /usr/lib64/libcuda.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libcudadebugger.so.545.29.06
    hostPath: /usr/lib64/libcudadebugger.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvcuvid.so.545.29.06
    hostPath: /usr/lib64/libnvcuvid.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-allocator.so.545.29.06
    hostPath: /usr/lib64/libnvidia-allocator.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-cfg.so.545.29.06
    hostPath: /usr/lib64/libnvidia-cfg.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-egl-gbm.so.1.1.1
    hostPath: /usr/lib64/libnvidia-egl-gbm.so.1.1.1
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-eglcore.so.545.29.06
    hostPath: /usr/lib64/libnvidia-eglcore.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-encode.so.545.29.06
    hostPath: /usr/lib64/libnvidia-encode.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-fbc.so.545.29.06
    hostPath: /usr/lib64/libnvidia-fbc.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-glcore.so.545.29.06
    hostPath: /usr/lib64/libnvidia-glcore.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-glsi.so.545.29.06
    hostPath: /usr/lib64/libnvidia-glsi.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-glvkspirv.so.545.29.06
    hostPath: /usr/lib64/libnvidia-glvkspirv.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-gpucomp.so.545.29.06
    hostPath: /usr/lib64/libnvidia-gpucomp.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-gtk3.so.545.29.06
    hostPath: /usr/lib64/libnvidia-gtk3.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-ml.so.545.29.06
    hostPath: /usr/lib64/libnvidia-ml.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-ngx.so.545.29.06
    hostPath: /usr/lib64/libnvidia-ngx.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-nvvm.so.545.29.06
    hostPath: /usr/lib64/libnvidia-nvvm.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-opencl.so.545.29.06
    hostPath: /usr/lib64/libnvidia-opencl.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-opticalflow.so.545.29.06
    hostPath: /usr/lib64/libnvidia-opticalflow.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-pkcs11-openssl3.so.545.29.06
    hostPath: /usr/lib64/libnvidia-pkcs11-openssl3.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-ptxjitcompiler.so.545.29.06
    hostPath: /usr/lib64/libnvidia-ptxjitcompiler.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-rtcore.so.545.29.06
    hostPath: /usr/lib64/libnvidia-rtcore.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-tls.so.545.29.06
    hostPath: /usr/lib64/libnvidia-tls.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvidia-wayland-client.so.545.29.06
    hostPath: /usr/lib64/libnvidia-wayland-client.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/lib64/libnvoptix.so.545.29.06
    hostPath: /usr/lib64/libnvoptix.so.545.29.06
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/sbin/nvidia-persistenced
    hostPath: /usr/sbin/nvidia-persistenced
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/sbin/nvidia-smi
    hostPath: /usr/sbin/nvidia-smi
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/firmware/nvidia/545.29.06/gsp_ga10x.bin
    hostPath: /lib/firmware/nvidia/545.29.06/gsp_ga10x.bin
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /lib/firmware/nvidia/545.29.06/gsp_tu10x.bin
    hostPath: /lib/firmware/nvidia/545.29.06/gsp_tu10x.bin
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json
    hostPath: /usr/share/egl/egl_external_platform.d/10_nvidia_wayland.json
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
    hostPath: /usr/share/egl/egl_external_platform.d/15_nvidia_gbm.json
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/share/glvnd/egl_vendor.d/10_nvidia.json
    hostPath: /usr/share/glvnd/egl_vendor.d/10_nvidia.json
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/share/vulkan/icd.d/nvidia_icd.json
    hostPath: /usr/share/vulkan/icd.d/nvidia_icd.json
    options:
    - ro
    - nosuid
    - nodev
    - bind
  - containerPath: /usr/share/vulkan/implicit_layer.d/nvidia_layers.json
    hostPath: /usr/share/vulkan/implicit_layer.d/nvidia_layers.json
    options:
    - ro
    - nosuid
    - nodev
    - bind
devices:
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia0
      gid: 27
    - path: /dev/dri/card1
    - path: /dev/dri/renderD129
    hooks:
    - args:
      - nvidia-ctk
      - hook
      - create-symlinks
      - --link
      - ../card1::/dev/dri/by-path/pci-0000:01:00.0-card
      - --link
      - ../renderD129::/dev/dri/by-path/pci-0000:01:00.0-render
      hookName: createContainer
      path: /usr/sbin/nvidia-ctk
    - args:
      - nvidia-ctk
      - hook
      - chmod
      - --mode
      - "755"
      - --path
      - /dev/dri
      hookName: createContainer
      path: /usr/sbin/nvidia-ctk
  name: "0"
- containerEdits:
    deviceNodes:
    - path: /dev/nvidia0
      gid: 27
    - path: /dev/dri/card1
    - path: /dev/dri/renderD129
    hooks:
    - args:
      - nvidia-ctk
      - hook
      - create-symlinks
      - --link
      - ../card1::/dev/dri/by-path/pci-0000:01:00.0-card
      - --link
      - ../renderD129::/dev/dri/by-path/pci-0000:01:00.0-render
      hookName: createContainer
      path: /usr/sbin/nvidia-ctk
    - args:
      - nvidia-ctk
      - hook
      - chmod
      - --mode
      - "755"
      - --path
      - /dev/dri
      hookName: createContainer
      path: /usr/sbin/nvidia-ctk
  name: all
kind: nvidia.com/gpu
elezar commented 9 months ago

Just to confirm, 27 in the above example is the numeric ID of the video group?

forestofrain commented 9 months ago

Correct, 27 is video on gentoo.

elezar commented 9 months ago

@forestofrain I was trying to set up an instance to test this locally, but was running into some issues with the driver installation to get this going. Do you have a link to some docs on getting a working Gentoo GPU-based system? (This would be terminal only).

elezar commented 9 months ago

I have been able to dig a bit further on a OpenSUSE system with a similar device node configuration (the GID is different, but that should not affect the findings).

One thing to note when running "rootless" podman is that the root:video user-group combination on the host is mapped to nobody:nogroup in the container, meaning that the device nodes show up as:

$ ls -al /dev/nvi*
crw-rw-rw- 1 nobody nogroup 236,   0 Jan 24 15:34 /dev/nvidia-uvm
crw-rw-rw- 1 nobody nogroup 236,   1 Jan 24 15:34 /dev/nvidia-uvm-tools
crw-rw---- 1 nobody nogroup 195,   0 Jan 24 15:34 /dev/nvidia0
crw-rw---- 1 nobody nogroup 195, 255 Jan 24 15:34 /dev/nvidiactl

Also note that in this case the low-level runtime (runc) does not mknod the devices with the properties from the OCI Runtime spec, but instead bind mounts them into the container. Note that the mode bitmask is not modified in this operation and the same 660 permissions from the host

One thing to note is that when a container is created in a userns, runc does not mknod in the container, but instead bind mounts the device node into the container. In addition, the user and group are mapped to nobody. I am not familiar enough with podmans uid and gid mappings to provide a solution off the top of my head.

elezar commented 9 months ago

Another update.

Looking at the following entry in the troubleshooting guide: https://github.com/containers/podman/blob/main/troubleshooting.md#20-passed-in-devices-or-files-cant-be-accessed-in-rootless-container

I confirmed that in my setup when running:

podman run --rm -ti --device nvidia.com/gpu=all --group-add keep-groups --runtime=crun ubuntu nvidia-smi -L
GPU 0: Tesla T4 (UUID: GPU-cdd5cfb4-69a9-a04b-4c87-070d09c51772)

with crun available gives the desired output.

Wheras with runc it still fails:

podman run --rm -ti --device nvidia.com/gpu=all --group-add keep-groups --runtime=runc ubuntu nvidia-smi -L
Failed to initialize NVML: Insufficient Permissions

Note that there is also an entry that describes using uid and gid maps to achieve similar results: https://github.com/containers/podman/blob/main/troubleshooting.md#35-passed-in-devices-or-files-cant-be-accessed-in-rootless-container-uidgid-mapping-problem

forestofrain commented 9 months ago

@elezar thanks for the quick solution! Running your last commands, I get the same results.

forestofrain commented 9 months ago

Your solution also lead me to a Red Hat article that provided a nice config snippet that works.

[containers]
annotations=["run.oci.keep_original_groups=1",]

Now I can run this older Tensorflow container with less options on the command line :)

podman run --userns keep-id --rm -it --device nvidia.com/gpu=all tensorflow/tensorflow:2.11.0-gpu nvidia-smi -L
GPU 0: NVIDIA GeForce RTX 3080 (UUID: GPU-...)
forestofrain commented 9 months ago

Thanks for all the help.

elezar commented 9 months ago

Note that this solution may also work for https://github.com/NVIDIA/nvidia-container-runtime/issues/145

cc @qhaas