Open romintomasetti opened 2 years ago
Had a similar issue when I was building a Docker image with ROCm support.
A non-root user can't access the GPU resources and has to run commands as sudo
for GPU access.
A user inside the docker container has to be a member of the video
and render
groups to access the GPU without sudo
The video
group exists by default on Debian systems and has the fixed id of 44
, so there's no need to do anything as long as the group on the host system and inside the container have the same name and id.
The render
group, on the other hand, is created by the amdgpu-install
script on the host system and the id gets randomly assigned, for example it can be one of the following: 104
, 109
or 110
Using Docker ENTRYPOINT
to dynamically create and assign the render group with the host system render group id.
Create an entrypoint.sh
script, and add it during the build to the image.
The script will create the render group with the host's group id and assign the user to the video and render groups.
#!/bin/bash
sudo groupadd --gid $RENDER_GID render
sudo usermod -aG render $USERNAME
sudo usermod -aG video $USERNAME
exec "$@"
Inside the Dockerfile we create a new user and copy the entrypoint.sh script to the image. A basic example:
FROM ubuntu
ENV USERNAME=rocm-user
ARG USER_UID=1000
ARG USER_GID=$USER_UID
RUN groupadd --gid $USER_GID $USERNAME \
&& useradd --uid $USER_UID --gid $USER_GID -m $USERNAME \
&& echo $USERNAME ALL=\(root\) NOPASSWD:ALL > /etc/sudoers.d/$USERNAME \
&& chmod 0440 /etc/sudoers.d/$USERNAME
COPY entrypoint.sh /tmp
RUN chmod 777 /tmp/entrypoint.sh
USER $USERNAME
ENTRYPOINT ["/tmp/entrypoint.sh"]
CMD ["/bin/bash"]
docker build -t rocm-image .
When starting the container pass the RENDER_GID
environment variable. Let's assume the Docker image is called rocm-image
.
export RENDER_GID=$(getent group render | cut -d: -f3) && docker run -it --device=/dev/kfd --device=/dev/dri -e RENDER_GID --group-add $RENDER_GID rocm-image /bin/bash
Just add the following code to .devcontainer/devcontainer.json
file and you're good to go. A VS Code devcontainer with GPU access.
{
"build": { "dockerfile": "./Dockerfile" }
"overrideCommand": false,
"initializeCommand": "echo \"RENDER_GID=$(getent group render | cut -d: -f3)\" > .devcontainer/devcontainer.env",
"containerEnv": { "HSA_OVERRIDE_GFX_VERSION": "10.3.0" },
"runArgs": [
"--env-file=.devcontainer/devcontainer.env",
"--device=/dev/kfd",
"--device=/dev/dri"
]
}
On one of our machines GID of render
group on host overlapped with ssh
group in the image, so groupadd
from the init script failed. It's best to replace use the group id in the following usermod
to still get acceptable result in such a scenario.
Initial issue
As stated in https://rocmdocs.amd.com/en/latest/Installation_Guide/Installation_new.html#setting-permissions-for-groups, for Ubuntu 20 and above, the user needs to be part of the
render
group.Therefore, we need to create the
render
group in the docker image. The following would work:We might also want to update the documentation because the
docker run
command should contain--group-add render
for Ubuntu 20 and above.Update - 10th June 2022
I made the following experiments. The user I'm logged in on the host is part of the
render
group. My user ID is1002
.works because it runs as
root
(with user ID 0 on the host) andwill not work with
Unable to open /dev/kfd read-write: Permission denied
.will not work because inside of
rocm/dev-ubuntu-20.04:5.1
there is no render group.will work again.
Therefore, I see 2 ways of fixing this.
Add a render group in the Docker image with ID 109 by default. This would be a "build time" fix and would break as soon as the host render group ID is not 109. The group ID could be passed as an argument of the build (
ARG
) but the image would not be portable.--group-add $(getent group render | cut -d':' -f 3)
.