ROCm / ROCm-docker

Dockerfiles for the various software layers defined in the ROCm software platform
MIT License
405 stars 67 forks source link

Compatibility with Ubuntu repo's Docker version 1.13.1 #33

Open settle opened 6 years ago

settle commented 6 years ago

From a fresh install of Ubuntu 16.04.3: sudo apt-get update sudo apt-get dist-upgrade sudo apt-get install libnuma-dev reboot wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add - sudo sh -c 'echo deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main > /etc/apt/sources.list.d/rocm.list' sudo apt-get update sudo apt-get install rocm-dkms sudo usermod -a -G video $LOGNAME reboot sudo apt-get install docker.io reboot

At this point I get "4.13.0-36-generic" when I type "uname -r", not anything with kfd like in the quick-start.md. Maybe Docker CE is only supported since that's what's linked to in the documentation, but it doesn't say that docker.io in Ubuntu's default repos is not supported so I crossed my fingers and gave that a try. I built and ran the rocm-terminal no problem, but got two errors about missing python3 and libnuma-dev when trying to run rocm-smi from within the container. After I installed those two rocm-smi seemed to work, but rocminfo failed returning "hsa api call failure at line 900, file /rocmdata/jedwards/git/compute/rocrinfo/rocminfo.cc. Call returned 4104". I compiled the example HelloWorld and vector_copy, but neither could locate my device when I tried to run them. I tried repeating with "--privileged --device=/dev/kfd" with "sudo docker run" but same results.

Summary: 1) Please confirm support status for Docker from Ubuntu's (and Fedora's) repo, and if none at the moment please add support 2) Dockerfile for rocm-terminal may need "apt-get install python3 libnuma-dev"

settle commented 6 years ago

@kknox, Sorry, I'm not sure if anyone gets notified when issues are posted here or who's best to try to contact. Actually, is this the preferred location to post issues? Anyways, a quick look at other ROCm repos showed replies within just one or two days so thought I'd check.

bragadeesh commented 6 years ago

@settle @kknox is no longer the one to provide support @jedwards-AMD could you help?

settle commented 6 years ago

I repeated the above (fresh install, etc., though this time with Ubuntu 16.04.4) but this time I installed docker-ce per the readme. Unfortunately I ended up with the same results, error about missing python3 when trying to run rocm-smi the first time, rocminfo errored because of missing libnuma-dev, and even after installing that it errored out saying "Ill-formed call, no flag or invalid flags passed", and unable to find my device with the samples after I compiled and tried to run them. So there seems to be something wrong more than just Ubuntu's docker.io versus Docker's docker-ce.

sunway513 commented 6 years ago

Hi @settle , thanks for checking out rocm-docker repo. I'm currently trying to clean things out, please take a look at my PR and see if that can be helpful to your issue: https://github.com/RadeonOpenCompute/ROCm-docker/pull/35

settle commented 6 years ago

Thank you for looking into this. I managed to get rocm working within an Ubuntu 16.04 container using docker from Ubuntu's default repositories. Below are the bare minimum instructions at the moment. I say at the moment because I had to add "RUN mkdir /etc/udev/rules.d" to the Dockerfile prior to installing rocm-dkms because "/etc/udev/rules.d" did not exist and would cause the docker build to fail. If you could, please include "mkdir /etc/udev/rules.d" prior to "tee /etc/udev/rules.d/kdf.rules" command in the relevant rocm-dkms install script.

From a fresh install of Ubuntu 16.04.3:

sudo apt-get update
sudo apt-get dist-upgrade
sudo apt-get install libnuma-dev
reboot
wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | sudo apt-key add -
echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | sudo tee /etc/apt/sources.list.d/rocm.list
sudo apt-get update
sudo apt-get install rocm-dkms
sudo usermod -a -G video $LOGNAME
reboot
sudo apt-get install docker.io

Dockerfile:

FROM ubuntu:16.04

RUN apt-get update
RUN apt-get dist-upgrade
RUN apt-get install -y --no-install-recommends \
  wget \
  libnuma-dev

RUN wget -qO - http://repo.radeon.com/rocm/apt/debian/rocm.gpg.key | apt-key add -
RUN echo 'deb [arch=amd64] http://repo.radeon.com/rocm/apt/debian/ xenial main' | tee /etc/apt/sources.list.d/rocm.list

RUN apt-get update
RUN mkdir /etc/udev/rules.d
RUN apt-get install -y --no-install-recommends \
  rocm-dkms

RUN rm -rf /var/lib/apt/lists/*

Finally, building and running:

sudo docker build -t ubuntu-rocm .
sudo docker run -it --rm --device=/dev/kfd --device=/dev/dri/renderD128 ubuntu-rocm
settle commented 6 years ago

@sunway513 Just wanted to follow up if any maintainers here reproduced the above and saw that it works and greatly simplifies all the instructions (honestly now I'm not even sure what all the rest of the rocm-docker repo contains, but that's a good thing because it means AMD now supports rocm-enabled docker with such ease). I guess the only thing compared to vanilla docker is the reminder to pass the "--device=/dev/kfd --device=/dev/dri/renderD128" or similar when invoking "docker run".

The only outstanding issue in the thread is how I can remove the "mkdir /etc/udev/rules.d" currently required in the Dockerfile? Once that can be removed I'd say we can go ahead and close this issue.

sunway513 commented 6 years ago

Hi @settle , rocm-dkms contains both user bits and Linux Kernel DKMS modules. For Dockerfile, you only need to install rocm-dev, not therocm-dkms. Please take a look at the rocm/rocm-terminal Dockerfile I've been recently updated. We don't typically add "mkdir /etc/udev/rules.d" in Dockerfile, I don't know why it's required.

I've also updated the document, for ROCm1.7, the command to run a docker image is: sudo docker run -it --device=/dev/kfd --device=/dev/dri --group-add video rocm/rocm-terminal