Closed JanLJL closed 12 hours ago
i am also interested in being able to support cpu pinning in combination with gpu usage. what is the current best practise wrt likwidpin?
@JanLJL you mentioned likwid-topology
, but what is the proper flow a use should follow. i am also interested if likwid pin supports a hierarchy: if parent processes use a gpu, make sure the children are also pinned on cores in the same numa domain.
a very recent issue we had was people running torchrun with python code doing dataloader+train, and dataloaders. the dataload+train is what nvidia-smi reports as using the gpu, the remaining dataloaders are child processes of the train+dataload. torchrun is really crappy in pinning correctly, so we are looking for a way to "help" it. likwidpin would be a good candidate for this, but it's unclear how one woud invoke it
Hello,
Thanks for increasing priority on this feature request.
The current workflow would be to run likwid-topology
to get the NUMA node where the GPU is attached to. Then you use likwid-pin -c Mx:y-z
(x
= NUMA domain ID, y
and z
for the number of HW threads).
One big question for this feature request is whether likwid-pin
should also enforce the application to run on the selected GPU(s). I have not found a portable solution to do that yet. The CUDA_VISIBLE_DEVICES
environment variable is fine on exclusive systems but inside e.g. shared-node SLURM jobs each with a GPU, this approach does not work anymore. Each SLURM job gets CUDA_VISIBLE_DEVICES=0
but under the hood, they are using different GPUs. My guess is that it is enforced through cgroups
but I havn't found out how by now.
I never tried likwid-pin
with PyTorch. There might be some other difficulties coming up (e.g. shepherd processes).
Hierarchies are currently not supported but also not needed. likwid-pin
works on single processes, so either this process is using a GPU or not. They would be more interesting for likwid-mpirun
where one MPI process could use a GPU while the others not. There is currently no way to do that because likwid-mpirun
does not yet support the (I call it) colon syntax: mpirun <global opts> <local opts> <exec> <args1> : <local opts> <exec> <args2> : ...
. With the colon syntax, hierarchies should be doable.
Is your feature request related to a problem? Please describe. Often, GPUs are not closest to the NUMA domain a humain might think (e.g., GPU 3 is closest to NUMA domain 0, etc). Not every user remembers to run
likwid-topology
first to get the corresponding NUMA domains for their GPU(s).Describe the solution you'd like Add a affinity domain for
likwid-pin
andlikwid-perfctr
, e.g.,G
for placing HW threads close to the GPU. For example, pinning 10 HWthreads closest to GPU 1: