Closed hkershaw-brown closed 1 week ago
Hi Helen,
This isn't something that I plan to support with mpibind, at least not on derecho, as the underlying hardware does not support running 256 ranks on a node. From the man page (tucked conveniently toward the very end)
Processes Per Node Limitations If you are using HPE Slingshot NICs (SS-11), there is a limitation on the maximum number of MPI ranks per NIC. The NIC has the resources to support a maximum of 254 ranks. If you try to use more than this many ranks, MPI will fail to initialize. We recommend running with multiple threads per process as an alternative to very high process counts.
I can detect and issue an error message though. That would at least be an improvement over the current state of things.
Fair enough!
Running 256 mpi tasks on 128 cpus
PBS -l select=1:ncpus=128:mpiprocs=256
mpibind ./get_cpu
cores 64 to 128 get double tasks cores 192 to 255 get no tasks
Full mpibind mpibind.6296034.log