Open haampie opened 3 years ago
Hi @Madeeks, I haven't tested this for multiple GPUs, but in principle it should work. Every GPU should should be listed in /dev/dri/card{n}
for n = 0, 1, ..., and this PR is mounting /dev/dri
entirely.
I'll think about autodetection like we have for NVIDIA GPUs, but didn't immediately know what to check. AMD likes to install /opt/rocm/bin/hipconfig
to check the version of the rocm libs, but that doesn't imply there are actual GPUs available. Maybe best is to check if vendor data is available from /dev/dri/card*
and/or /dev/kfd/*
.
Ok, so the way rocm_agent_enumerator
detects AMD GPUs is by calling hsa_iterate_agents
, which is available from a spack package https://github.com/spack/spack/blob/develop/var/spack/repos/builtin/packages/hsa-rocr-dev/package.py, but depends on AMD's fork of LLVM :D so not a great dependency to just add to Sarus.
Another idea is to check if rocminfo
is in the PATH or /opt/rocm/bin/rocminfo
exists, and if so execute it and grep the output for some string. That's a bit ugly, but probably easiest.
Let me elaborate a bit more my question about hook interface and device selection.
The CUDA runtime uses the CUDA_VISIBLE_DEVICES
environment variable to determine the GPU devices applications have access to. The NVIDIA Container Toolkit uses NVIDIA_VISIBLE_DEVICES
to determine which GPUs to mount inside the container. By checking for the presence of such variables, Sarus does not need an explicit CLI option to know if the host process is requesting GPU devices (and which ones).
I was wondering if there were analogous variables in the ROCm environment.
A quick seach brought me to the following issues: https://github.com/RadeonOpenCompute/ROCm/issues/841, https://github.com/RadeonOpenCompute/ROCm/issues/994
From what I understand there are 2 variables which cover similar roles: HIP_VISIBLE_DEVICES
and ROCR_VISIBLE_DEVICES
.
I don't have experience with ROCm, so according to you can anyone of those be used to control hook activation? If so, which one is the most appropriate? How does the numerical ids in those variables relate to the /dev/dri/*
files?
As an additional reference, the GRES plugin of Slurm sets CUDA_VISIBLE_DEVICES
to the GPUs allocated by the workload manager. What's the mechanism implemented by Slurm (or other workload managers) to signal allocation of AMD GPUs?
Ah, Ault is configured such that by default you get all GPUs.
$ srun -p amdvega /bin/bash -c 'echo "ROCM_VISIBLE_DEVICES: $ROCR_VISIBLE_DEVICES"; /opt/rocm/bin/rocm_agent_enumerator; ls /dev/dri/card*'
ROCM_VISIBLE_DEVICES:
gfx000
gfx906
gfx906
gfx906
/dev/dri/card0
/dev/dri/card1
/dev/dri/card2
/dev/dri/card3
$ srun -p amdvega --gres=gpu:1 /bin/bash -c 'echo "ROCM_VISIBLE_DEVICES: $ROCR_VISIBLE_DEVICES"; /opt/rocm/bin/rocm_agent_enumerator; ls /dev/dri/card*'
ROCM_VISIBLE_DEVICES: 0
gfx000
gfx906
/dev/dri/card0
/dev/dri/card1
/dev/dri/card2
/dev/dri/card3
$ srun -p amdvega --gres=gpu:3 /bin/bash -c 'echo "ROCM_VISIBLE_DEVICES: $ROCR_VISIBLE_DEVICES"; /opt/rocm/bin/rocm_agent_enumerator; ls /dev/dri/card*'
ROCM_VISIBLE_DEVICES: 0,1,2
gfx000
gfx906
gfx906
gfx906
/dev/dri/card0
/dev/dri/card1
/dev/dri/card2
/dev/dri/card3
$ srun -p amdvega --gres=gpu:2 /bin/bash -c '/opt/rocm/bin/rocminfo | grep GPU'
Uuid: GPU-3f50506172fc1a63
Device Type: GPU
Uuid: GPU-3f4478c172fc1a63
Device Type: GPU
$ srun -p amdvega --gres=gpu:2 /bin/bash -c '/opt/rocm/opencl/bin/clinfo | grep Number'
Number of platforms: 1
Number of devices: 2
So, ROCM_VISIBLE_DEVICES
is only set by when --gres=gpu[:n]
is provided. When it is set, I think it's handled on the software level by the ROCm stack, so we might not want to bother doing the bookkeeping of mounting exactly those specific GPUs from /dev/dri, but leave ROCm to that. For instance:
$ ROCR_VISIBLE_DEVICES=1,2 sarus run -t --mount=type=bind,src=/dev/kfd,dst=/dev/kfd --mount=type=bind,src=/dev/dri,dst=/dev/dri stabbles/sirius-rocm /opt/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc-9.3.0/rocminfo-4.0.0-lruzhymnjm4hez3jeuyf3kyhmjjloqyp/bin/rocm_agent_enumerator
gfx000
gfx906
gfx906
How about we just unconditionally mount /dev/kfd
and /dev/dri
when they exist?
Edit: in fact I find it only confusing to mount just a few specific GPUs, because ROCR_VISIBLE_DEVICES=1,2 should then be unset or relabeled to ROCR_VISIBLE_DEVICES=0,1 inside the container:
$ ls /dev/dri/
by-path card0 card1 card2 card3 renderD128 renderD129 renderD130
$ ROCR_VISIBLE_DEVICES=1,2 sarus run \
--mount=type=bind,src=/dev/kfd,dst=/dev/kfd \
--mount=type=bind,src=/dev/dri/renderD129,dst=/dev/dri/renderD129 \
--mount=type=bind,src=/dev/dri/renderD130,dst=/dev/dri/renderD130 \
stabbles/sirius-rocm /bin/bash -c '/opt/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc-9.3.0/rocminfo-4.0.0-lruzhymnjm4hez3jeuyf3kyhmjjloqyp/bin/rocminfo'
.. only shows 1 gpu because ROCR_VISIBLE_DEVICES is still 1,2 and the GPUs are labeled 0,1 now ...
$ ROCR_VISIBLE_DEVICES=1,2 sarus run \
--mount=type=bind,src=/dev/kfd,dst=/dev/kfd \
--mount=type=bind,src=/dev/dri/renderD129,dst=/dev/dri/renderD129 \
--mount=type=bind,src=/dev/dri/renderD130,dst=/dev/dri/renderD130 \
stabbles/sirius-rocm /bin/bash -c 'unset ROCR_VISIBLE_DEVICES && /opt/spack/opt/spack/linux-ubuntu20.04-x86_64/gcc-9.3.0/rocminfo-4.0.0-lruzhymnjm4hez3jeuyf3kyhmjjloqyp/bin/rocminfo'
... shows 2 gpus correctly ...
Adds a hook for AMD GPUs, which currently just mounts /dev/dri and /dev/kfd as advocated by AMD.
Hook can be enabled through the following flag:
It will just fail when /dev/dri or /dev/kfd does not exist or can't be mounted.