Open x1y2z3456 opened 5 years ago
Any luck on the solution here please?
You filter the selected device on ocl level inside the container. I'm mobile so I cant look it up now. But there is a way. Also, k8s GPU filter support is evolving.
Edit: export GPU_DEVICE_ORDINAL=1 Source: https://github.com/codeplaysoftware/computecpp-sdk/issues/107
K8s GPU selection support for AMD: https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/ Note: Rancher 2.2+ has elaborate cluster support for these k8s clusters making these and other tasks more user friendly.
The way of "export GPU_DEVICE_ORDINAL=1" assign GPU works but it did just "virtually", which means i can still check 2 GPU by "rocm-smi" command i've tried the new version of rocm-driver, which version is 3.0 it still can not seperate GPUS "physically" what i really want is when i start up a new container with the following command
docker run -it --network=host --device=/dev/kfd --device=/dev/dri/card0 --device=/dev/dri/renderD128 --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined rocm/tensorflow bash
of course under the directory of /dev ls /dev card0 renderD128 which shows only one GPU
but using rocm-smi shows 2 GPUs rocm-smi GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU% 0 47.0c 9.0W 800Mhz 100Mhz 0.0% auto 162.0W 2% 0% 1 49.0c 11.0W 800Mhz 100Mhz 0.0% auto 162.0W 1% 0%
thanks for reply anyway
With the recent ROCm 3.9, I am able to see 1 GPU being reported in rocm-smi inside the docker container. Maybe something was fixed since last reported.
docker run -it --cap-add=SYS_PTRACE --security-opt seccomp=unconfined --device=/dev/kfd --device=/dev/dri/card8 --device=/dev/dri/renderD135 --cap-add=SYS_RAWIO --device=/dev/mem --group-add video --network host rocm/dev-ubuntu-18.04
root@login:/# /opt/rocm/bin/rocm-smi
======================= ROCm System Management Interface =======================
================================= Concise Info =================================
GPU Temp AvgPwr SCLK MCLK Fan Perf PwrCap VRAM% GPU%
0 41.0c 32.0W 930Mhz 1000Mhz 0.0% auto 225.0W 0% 0%
================================================================================
============================= End of ROCm SMI Log ==============================
root@login:/# ls -al /dev/dri/
card8 renderD135
root@login:/# ls -al /dev/dri/*
crw-rw---- 1 root video 226, 8 Dec 12 00:32 /dev/dri/card8
crw-rw---- 1 root video 226, 135 Dec 12 00:32 /dev/dri/renderD135
root@login:/#
Yes with recent ROCm it shows only one GPU but when I'm trying build libtorch
inside the container, it gives error something like readkfd permission denied and is not allowed
.
Hi, I was wondering whether it is possible to assign a single AMD GPU to container, have tried the following command(trying to assign GPU 0 to container):
docker run -it --network=host --device=/dev/kfd --device=/dev/dri/card0 --device=/dev/dri/renderD128 --group-add video --cap-add=SYS_PTRACE --security-opt seccomp=unconfined rocm/tensorflow bash
But inside the container, using the command rocm-smi still shows two AMD GPUs:
root@ryan-desktop:/root# rocm-smi
==================== ROCm System Management Interface ================ ================================================================ GPU Temp AvgPwr SCLK MCLK Fan Perf SCLK OD MCLK OD 0 31c N/A 300Mhz 300Mhz 23.92% manual 0% 0% 1 29c N/A 300Mhz 300Mhz 23.92% auto 0% 0% ================================================================ ==================== End of ROCm SMI Log ===========================
Linux distribution version: ryan@ryan-desktop:~$ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 16.04.4 LTS Release: 16.04 Codename: xenial
Docker version: ryan@ryan-desktop:~$ docker --version Docker version 17.03.2-ce, build f5ec1e2
Docker image: rocm/tensorflow
Kernel version: ryan@ryan-desktop: $ uname -a Linux ryan-desktop 4.15.0-38-generic # 41~16.04.1-Ubuntu SMP Wed Oct 10 20:16:04 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
ROCM version: ryan@ryan-desktop:~$ apt show rocm-libs -a Package: rocm-libs Version: 1.9.211 Priority: optional Section: devel Maintainer: Advanced Micro Devices Inc. Installed-Size: 1024 B Depends: rocfft, rocrand, hipblas, rocblas Homepage: https://github.com/RadeonOpenCompute/ROCm Download-Size: 772 B APT-Sources: http://repo.radeon.com/rocm/apt/debian xenial/main amd64 Packages Description: Radeon Open Compute (ROCm) Runtime software stack
CPU information: model name : Intel(R) Core(TM) i3-8100 CPU @ 3.60GHz
GPU information: RX 580 4G *2
Thanks for help anyway