Azure / cyclecloud-slurm

Azure CycleCloud project to enable users to create, configure, and use Slurm HPC clusters.
MIT License
58 stars 43 forks source link

Slurm nodenames not matching CycleCloud hostnames cause some MPI variants to fail #65

Open anhoward opened 3 years ago

anhoward commented 3 years ago

Depending on the version of MPI or ISV code being used, occasionally they try to rely on the Slurm nodenames which aren't actual resolvable hostnames. This causes the jobs to fail.

It would be good if the actual hostnames on the nodes and in Azure DNS matched the nodename used in Slurm.

gjhw commented 3 years ago

We are seeing this with Abaqus. It's worth noting that we are confined to running in UK South where we only have H Series available, which do not have SR-IOV support and therefore limits us to Intel MPI. When HC Series lands later this year (with SR-IOV support), we expect to be able to use the MPI that ships with Abaqus and will see if this allows multi-node jobs to run when Slurm node names do not match host names.

tbugfinder commented 2 years ago

It looks like this is now supported with v2.5.0 / v2.5.1

tbugfinder commented 4 weeks ago

close?