For some setups, we would get MPI_Allreduce truncation errors (but weirdly, not for the weak scaling setup).
Here is a fix along with a few more sanity checks.
PS: errors like
Abort(203042319) on node 4092 (rank 4092 in comm 0): Fatal error in PMPI_Wait: Other MPI error, error stack:
PMPI_Wait(205)..................: MPI_Wait(request=0x7ffd5b0155cc, status=0x1) failed
MPIR_Wait(105)..................:
MPIDU_Sched_progress_state(1036): Invalid communicator
can be attributed to a "wrong" default of I_MPI_ADJUST_IBCAST that does not allow for non-power-of-two numbers of groups. parameters that worked for us (on IntelMPI 2019.12) were 1 and 4
For some setups, we would get MPI_Allreduce truncation errors (but weirdly, not for the weak scaling setup). Here is a fix along with a few more sanity checks.
PS: errors like
can be attributed to a "wrong" default of I_MPI_ADJUST_IBCAST that does not allow for non-power-of-two numbers of groups. parameters that worked for us (on IntelMPI 2019.12) were 1 and 4