Azure / azhpc-images

Azure HPC/AI VM Images
MIT License
98 stars 79 forks source link

IntelMPI bug with Slurm jobs #375

Open anhoward opened 2 months ago

anhoward commented 2 months ago

For some reason, it looks like Intel MPI jobs running on Slurm don't work properly if you don't set

I_MPI_HYDRA_IFACE=eth0

The initial job on a cluster works, but all subsequent jobs fail. Still trying to track down the actual cause, but if we can set this in the impi module that would be great.