Open ryanday36 opened 2 years ago
I sympathize and we really want to get to the point where the out of the box experience is good for all the mpis seen in the wild. However, it is challenging.
A couple of notes
pmi.clique
shell option, e.g. flux mini run -opmi.clique=value ...
where value is none
, pershell
, or single
.It would probably be good to open issues on each version of each vendor mpi that you have problems with.
Edit: open issues here, I mean, not in the mpi projects, unless it's their bug :-)
Eric Illescas (SNL) reports issues with using different MPI implementations than the implementation present during the build:
The flux version that ships with TOSS3 is MVAPICH centric and OpenMPI applications didn’t work. I did try srun ..–mpi=none, but I had inconsistent behavior and I gave up. Flux version w/ TOSS3
I downloaded the master branch and built it locally with our default OMPI/1.x environment.
Master branch: OMPI/1.x worked (default); IntelMPI/2018, MVAPICH2, OMPI/2.0 and OMP/4.0 did not;
He also reported a problem running multiple MPI applications on the same node:
MPI applications on different nodes worked. MPI applications sharing the same node hung. I suspect a common communicator (MPI_COMM).
I'll see if I can reproduce this on LC clusters / environment.