ROCm / ROCm-docker

Dockerfiles for the various software layers defined in the ROCm software platform
MIT License
432 stars 65 forks source link

Create `render` group for Ubuntu >= 20, as per ROCm documentation #90

Open romintomasetti opened 2 years ago

romintomasetti commented 2 years ago

Initial issue

As stated in https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation_new.html#setting-permissions-for-groups, for Ubuntu 20 and above, the user needs to be part of the render group.

Therefore, we need to create the render group in the docker image. The following would work:

RUN groupadd render

We might also want to update the documentation because the docker run command should contain --group-add render for Ubuntu 20 and above.

Update - 10th June 2022

I made the following experiments. The user I'm logged in on the host is part of the render group. My user ID is 1002.

  1. docker run --rm --device=/dev/kfd rocm/dev-ubuntu-20.04:5.1 rocminfo

    works because it runs as root (with user ID 0 on the host) and

    ll /dev/kfd 
    crw-rw---- 1 root render 510, 0 Jun  9 04:11 /dev/kfd
  2. docker run --rm --user=1002 --device=/dev/kfd rocm/dev-ubuntu-20.04:5.1 rocminfo

    will not work with Unable to open /dev/kfd read-write: Permission denied.

  3. docker run --rm --user=1002 --group-add render --device=/dev/kfd rocm/dev-ubuntu-20.04:5.1 rocminfo

    will not work because inside of rocm/dev-ubuntu-20.04:5.1 there is no render group.

  4. docker run --rm --user=1002 --group-add $(getent group render | cut -d':' -f 3) --device=/dev/kfd rocm/dev-ubuntu-20.04:5.1 rocminfo

    will work again.

Therefore, I see 2 ways of fixing this.

  1. Add a render group in the Docker image with ID 109 by default. This would be a "build time" fix and would break as soon as the host render group ID is not 109. The group ID could be passed as an argument of the build (ARG) but the image would not be portable.

    FROM rocm/dev-ubuntu-20.04:5.1
    
    RUN  groupadd -g 109 render && useradd -g 109 -ms /bin/bash newuser
    USER newuser
  2. The "run time" fix is to use the --group-add $(getent group render | cut -d':' -f 3).
sergejcodes commented 1 year ago

Had a similar issue when I was building a Docker image with ROCm support.

The Problem

A non-root user can't access the GPU resources and has to run commands as sudo for GPU access.

Groups

A user inside the docker container has to be a member of the video and render groups to access the GPU without sudo

Solution

Using Docker ENTRYPOINT to dynamically create and assign the render group with the host system render group id.

Bash Script

Create an entrypoint.sh script, and add it during the build to the image. The script will create the render group with the host's group id and assign the user to the video and render groups.

#!/bin/bash

sudo groupadd --gid $RENDER_GID render
sudo usermod -aG render $USERNAME
sudo usermod -aG video $USERNAME

exec "$@"

Dockerfile

Inside the Dockerfile we create a new user and copy the entrypoint.sh script to the image. A basic example:

FROM ubuntu

ENV USERNAME=rocm-user
ARG USER_UID=1000
ARG USER_GID=$USER_UID

RUN groupadd --gid $USER_GID $USERNAME \
    && useradd --uid $USER_UID --gid $USER_GID -m $USERNAME \
    && echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
    && chmod 0440 /etc/sudoers.d/$USERNAME

COPY entrypoint.sh /tmp
RUN chmod 777 /tmp/entrypoint.sh

USER $USERNAME

ENTRYPOINT ["/tmp/entrypoint.sh"]

CMD ["/bin/bash"]
docker build -t rocm-image .

Terminal

When starting the container pass the RENDER_GID environment variable. Let's assume the Docker image is called rocm-image.

export RENDER_GID=$(getent group render | cut -d: -f3) && docker run -it --device=/dev/kfd --device=/dev/dri -e RENDER_GID --group-add $RENDER_GID rocm-image /bin/bash

VS Code Devcontainer

Just add the following code to .devcontainer/devcontainer.json file and you're good to go. A VS Code devcontainer with GPU access.

{
  "build": { "dockerfile": "./Dockerfile" }
  "overrideCommand": false,
  "initializeCommand": "echo \"RENDER_GID=$(getent group render | cut -d: -f3)\" > .devcontainer/devcontainer.env",
  "containerEnv": { "HSA_OVERRIDE_GFX_VERSION": "10.3.0" },
  "runArgs": [
    "--env-file=.devcontainer/devcontainer.env",
    "--device=/dev/kfd",
    "--device=/dev/dri"
  ]
}
pawkubik commented 2 months ago

On one of our machines GID of render group on host overlapped with ssh group in the image, so groupadd from the init script failed. It's best to replace use the group id in the following usermod to still get acceptable result in such a scenario.

harkgill-amd commented 1 month ago

Hi @romintomasetti @sergejcodes, thank you for both reporting this issue and providing a detailed solution to the problem. This has been addressed in our newer images by defaulting to a root user in order to maintain access to GPU resources. Please let me know if we can close out this issue.

thesuperzapper commented 1 month ago

@harkgill-amd in many cases, clusters (Kubernetes) have security policies that prevent containers running as root, this limitation will prevent MANY companies from being able to use AMD GPUs for their AI workloads.

In Kubernetes, this is likely something that your https://github.com/ROCm/k8s-device-plugin can resolve by checking the host's render group and adding it as a supplementalGroups in the Pod's securityContext, but its problematic if the cluster has multiple nodes which don't have the same GID for their render group.

However, I feel there must be a more clean solution, because running Nvidia GPUs have no such problems either on local docker or their Kubernetes device plugin. I would check what they are doing, but it might be something like they have every device mount owned by a constant GID (e.g. 0, or something the user configures) and ensure the docker container run as users who have this group.

Here is the related issue on the AMD Device Plugin repo: https://github.com/ROCm/k8s-device-plugin/issues/39

thesuperzapper commented 1 month ago

Also, for context, when using Nvidia GPUs, you don't mount them with the --device parameter, but instead use the --gpus parameter, so perhaps this is part of their workaround:

For reference, here is information about the --device arg of docker run, perhaps there we need to explicitly allow read write with the :rwm suffix (which is the default), or set something on --device-cgroup-rule:

thesuperzapper commented 1 month ago

Although, I guess the real question is why AMD ever thought it was a good idea to not have a static GID for the render group. Perhaps the solution is to deprecate the render group and always use video or make a new group.

gigabyte132 commented 4 weeks ago

Hi @harkgill-amd, I don't think this should be closed as the inherent problem with using a non-root user is still prevalent, and there isn't a clean solution for this.

harkgill-amd commented 4 weeks ago

@thesuperzapper and @gigabyte132, thank you for the feedback. We are currently exploring the possibility of using udev rules to access GPU resources in place of render groups. The steps would be the following

  1. Create a new file /etc/udev/rules.d/70-amdgpu.rules with the following content:
      KERNEL=="kfd", MODE="0666"
      SUBSYSTEM=="drm", KERNEL=="renderD*", MODE="0666"
  2. Reload the udev rules with:
    sudo udevadm control --reload-rules && sudo udevadm trigger

    This configuration grants users read and write access to AMD GPU resources. From there, you can pass access to these devices into a container by specifying --device /dev/kfd --device /dev/dri in your docker run command. To restrict access to a subset of GPUs, please see the following documentation.

I ran this setup with the rocm/rocm-terminal image and am able to access GPU resources without any render group mapping or root privileges. Could you please give this a try on your end and let me know what you think?

thesuperzapper commented 4 weeks ago

@harkgill-amd while changing the permissions on the host might work, I will note that this does not seem to be required for Nvidia GPUs.

I imagine that this is because they mount the device paths specifically because /dev/dri is not the path of the actual device, so docker's --device mount (which claims to give the container read/write permissions) does not correctly change its permissions.

Because specifying each device is obviously a pain for end users, they added a custom --gpus feature (also see these docs) which requires users to install the nvidia-container-toolkit.


Also want to highlight the differences between the Kubernetes Device Plugin for AMD/Nvidia, as this is where most people are using lots of GPUs, and the permission issues also occur on AMD but not Nvidia:

thesuperzapper commented 3 weeks ago

@harkgill-amd after a lot of testing, it seems like the major container runtimes (including docker and containerd) don't actually change the permissions of devices mounted with --device like they claim to.

For example, you would expect the following command to mount /dev/dri/card1 with everybody having rw, but it does not:

docker run --device /dev/dri/card1 ubuntu ls -la /dev/dri

# OUTPUT:
# total 0
# drwxr-xr-x 2 root root     60 Oct 24 18:52 .
# drwxr-xr-x 6 root root    360 Oct 24 18:52 ..
# crw-rw---- 1 root  110 226, 1 Oct 24 18:52 card1

This is also seemingly happens on Kubernetes despite the AMD Device plugin requesting that the container be given rw on the device.

thesuperzapper commented 3 weeks ago

@harkgill-amd We need to find a generic solution which allows a non-root container to be run on any server (with a default install of AMD drivers)

This problematic because there is no standard GID for the render group, and the container runtimes don't respect requests to change the permissions of mounted devices.

Note, it seems like ubuntu has a default udev rule under /usr/lib/udev/rules.d/50-udev-default.rules which makes render the owner of /dev/dri/renderD* and video the owner of everything else in /dev/dri/.

Possible solutions

  1. Give everyone read/write /dev/dri/renderD* on the host (like you proposed above):

    • PROBLEM: Some users aren't going to want to make all their /dev/dri/renderD* devices have 0666 permissions.
  2. Create a new standard GID to add as an owner of /dev/dri/renderD* (or use video=44).

  3. Do what Nvidia does, and don't mount anything under /dev/dri/ in the container, and instead mount something like the /dev/nvidia0 devices which have crw-rw-rw- and seemingly are how CUDA apps interact with the GPUs.

  4. Mount the devices as bind volumes rather than as actual devices:

    • PROBLEM: would not work in Kubernetes, because the device plugin requires a list of device mounts be returned for a container that requests an amd.com/gpu: 1 limit, not volumes.
  5. Automatically add the detected GID of the render group to the user as the container starts (because we don't know what the GID is before we start running on a specific server):

    • PROBLEM: this would require the non-root container user to be able to edit /etc/group which would obviously allow root escalation
  6. Figure out why all the container runtimes are not respecting the request to change file permissions on device mounts.