Open danpetreamd opened 8 months ago
Using the instructions in the README.md:
$ sudo docker run -it --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video rocm/rocm-terminal
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
# rocminfo
ROCk module is loaded
Unable to open /dev/kfd read-write: Permission denied
Failed to get user name to check for video group membership
I'm wondering if this image is still in use and/or if we can deprecate it.
Same here.
Also, one does not need do sudo docker ...
As long as user is in a docker group, one can do just docker ...
. Usage of docker with sudo should not be promoted like this (not that it matter too much).
$ sudo docker run -it --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video rocm/rocm-terminal
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
rocm-user@015b5fcf64bf:~$ rocminfo
ROCk module is loaded
Unable to open /dev/kfd read-write: Permission denied
Failed to get user name to check for video group membership
rocm-user@015b5fcf64bf:~$ logout
$
There reason is because video
group is not good, it should be render
:
$ ls -l /dev/kfd
crw-rw---- 1 root render 243, 0 Dec 14 04:54 /dev/kfd
$
$ grep render /etc/group
render:x:993:user
$
For some reasons it does not work tho:
$ sudo docker run -it --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add video --group-add render rocm/rocm-terminal
docker: Error response from daemon: Unable to find group render: no matching entries in group file.
ERRO[0000] error waiting for container: context canceled
probably because /etc/group
in the container is different.
Running docker run
with --user=root
is a an option, which is not too bad (file system, processes, etc, are still isolated and safe), but would be nice to find a nicer solution.
This looks related - https://github.com/RadeonOpenCompute/ROCm-docker/issues/90
Still broken when following instructions current README.md
If I pass render
gid by number it complains, but works:
$ sudo docker run -it --device=/dev/kfd --device=/dev/dri --security-opt seccomp=unconfined --group-add $(getent group render | cut -d: -f3) rocm/rocm-terminal
groups: cannot find name for group ID 993
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.
rocm-user@3e3292ebfa5c:~$
and /dev/kfd works inside (i.e. rocminfo has no issues accessing it)
rocm-smi
works fine.The following was run on a 4x GPU System:
rocminfo
works fine inrocm/dev-ubuntu-22.04
androcm/pytorch
: