ROCm / ROCm-docker

Dockerfiles for the various software layers defined in the ROCm software platform
MIT License
434 stars 65 forks source link

ROCm docker on kvm guest #34

Open dannysemi opened 6 years ago

dannysemi commented 6 years ago

Is it possible to run ROCm-docker from a kvm guest? I haven't had any success so far. Works fine on the kvm guest itself. But I get the following errors in the container:

rocm-user@409e97fd5bce:/opt/rocm/hsa/sample$ ./vector_copy Initializing the hsa runtime failed.

rocm-user@409e97fd5bce:~$ ./HelloWorld Failed to find any OpenCL platforms. Failed to create OpenCL context.

I'm running this on an Intel i7 5960x and Vega 64. I'm using Ubuntu 16.04 with kernel upgraded to 4.13.0-36 from the hwe-16.04-edge packages on the kvm guest. I modified the Dockerfile for my rocm-terminal container to add the rocm-user to the 'video' group (I also had to reinstall make in the container).

All drivers and docker software on the kvm guest installed according to the install guide. Tried using both --device="/dev/kfd" and --privileged flags, no success.

settle commented 6 years ago

I could be wrong, but I think what you're experiencing is similar to #33. Were you ever able to run the rocm stack from within rocm-docker? If not, then it may not be an issue of the source of docker.io or docker-ce (me) or kvm (you) since we're both seeing similar behavior. Curious if you saw similar missing dependencies like python3 and libnuma-dev if you run rocm-smi or rocminfo?

dannysemi commented 6 years ago

I can run the examples from the guide successfully on my kvm guest but not from within a docker container on the kvm guest. python3 and libnuma-dev were missing from the container as well, but I edited the Dockerfile to include them. rocm-smi provides identical output to my kvm guest. rocminfo results in an error.

fxkamd commented 6 years ago

Are /dev/kfd and /dev/dri/renderD* visible inside docker with the right permissions?

dannysemi commented 6 years ago

With the --privileged flag enabled they are visible in the docker container:

rocm-user@1ea5a25a0df1:$ ls /dev/kfd /dev/kfd rocm-user@1ea5a25a0df1:$ ls /dev/dri card0 renderD128

This is the same output I get on the kvm guest from which I ran the container.

settle commented 6 years ago

What should the permissions be for "ls -l", or the chmod value you'd use if needed, i.e. 755 or 777?

dannysemi commented 6 years ago

@fxkamd Looks like it is a permissions thing. If I run the container as root then I get the expected result. Not as secure as it could be, but it will work for my dev environment.

fxkamd commented 6 years ago

@dannysemi On bare metal, udevd may have a rule to add the local console user to the access control lists of /dev/kfd and /dev/dri/renderD, so you don't need to mess with permissions yourself. That probably doesn't work in the container. You may need to change the permissions of /dev/kfd and /dev/dri/ in the container. Usually the permissions are set to 0660, group=video. Make sure your user account is in the video group, and you should be fine.

dannysemi commented 6 years ago

@fxkamd I tried that before with no success. Maybe the group mappings don't translate properly between host and container? If I create a user on host with all the proper permissions and then directly pass that user to the container with the -u flag then it works. Otherwise I have to run as root.

settle commented 6 years ago

@dannysemi I managed to get rocm-enabled docker (from Ubuntu's repo, not the Docker CE repo) working on kvm guests following the instructions #33. If you need the KVM + PCIe passthrough instructions I could provide those as well (my host is Fedora 27, guest is Ubuntu 16.04.3), but since you already have rocm running on your guest I assume that's not the root cause of this issue.

sunway513 commented 6 years ago

Could you try upgrade to docker.ce 18.04 as instructed in quick-start.md?