Closed ndryden closed 1 year ago
This is actually a segfault that appears to be a bug in SpectrumMPI. The hang is because of our signal handler sucking.
A simple reproducer results in a segfault on Lassen:
#include <vector>
#include <mpi.h>
int main(int argc, char** argv) {
MPI_Init(&argc, &argv);
std::vector<float> buf(1, 0.0f);
MPI_Scatter(buf.data(), 1, MPI_FLOAT, MPI_IN_PLACE, 1, MPI_FLOAT, 0, MPI_COMM_WORLD);
MPI_Finalize();
}
This appears to only occur when using in-place MPI_Scatter
with a buffer provided by an std::vector
and a single process. It works fine if the buffer is e.g. allocated with new
. It also works on Pascal with MVAPICH.
This has been fixed in the latest SpectrumMPI release.
Scatterv also hangs.
Regular scatter works fine. Also works with more processors. Likewise, other rooted collectives (bcast, gather, reduce) work in this config. This is a bit strange since this should be a NOP.
Need to also verify this is not a SMPI bug.