NCAR / mpibind

MPI binding utilities
MIT License
1 stars 1 forks source link

mpibind with oversubscribe (e.g. 256 mpi tasks on 128 cpus) appears to not use all 256 cores #7

Closed hkershaw-brown closed 1 week ago

hkershaw-brown commented 2 weeks ago

Running 256 mpi tasks on 128 cpus

PBS -l select=1:ncpus=128:mpiprocs=256

mpibind ./get_cpu

Chunk info
  1:ncpus=128:mpiprocs=256:ompthreads=1:mem=235gb:Qlist=cpu:ngpus=0
-- -- -- --
MPI exec line:
  mpiexec -n 256 -ppn 256 --cpu-bind none -env OMP_NUM_THREADS=1 /glade/u/apps/o
pt/mpitools/mpibind/cpu_bind ./get_cpu 
-- -- -- --
Binding Report:
rank: 0, cores: 0-0
rank: 1, cores: 1-1
rank: 2, cores: 2-2
...
rank: 126, cores: 126-126
rank: 127, cores: 127-127
rank: 128, cores: 64-64
rank: 129, cores: 65-65
...
rank: 253, cores: 189-189
rank: 254, cores: 190-190
rank: 255, cores: 191-191

cores 64 to 128 get double tasks cores 192 to 255 get no tasks

Full mpibind mpibind.6296034.log

roryck commented 2 weeks ago

Hi Helen,

This isn't something that I plan to support with mpibind, at least not on derecho, as the underlying hardware does not support running 256 ranks on a node. From the man page (tucked conveniently toward the very end)

Processes Per Node Limitations If you are using HPE Slingshot NICs (SS-11), there is a limitation on the maximum number of MPI ranks per NIC. The NIC has the resources to support a maximum of 254 ranks. If you try to use more than this many ranks, MPI will fail to initialize. We recommend running with multiple threads per process as an alternative to very high process counts.

I can detect and issue an error message though. That would at least be an improvement over the current state of things.

hkershaw-brown commented 2 weeks ago

Fair enough!