Open ardangelo opened 2 hours ago
Thanks! I played around with this a bit.
-o pmi=libpmi2
(which coerces cray MPI to use "simple PMI")Note that we fixed this bug in 0.68, where the two copies of the stdio file descriptors were being passed to spawned user processes:
Maybe that explains why this wasn't seen in prior releases?
This is pretty strange:
$ flux run -l -n2 ./testexec /usr/bin/cat /proc/self/fdinfo/0|dshbak -c
1: wait status=0
0: wait status=0
----------------
0
----------------
waiting for 3390300
pos: 0
flags: 02
mnt_id: 10
----------------
1
----------------
waiting for 3390301
pos: 0
flags: 0100000
mnt_id: 24
On rank 0 (the "good": stdin), flags are 2 (O_RDWR) and mnt_id is 10. On rank 1 (the "bad" stdin) flags are 0100000 (O_LARGEFILE) and mnt_id is 24.
When I run a non-failing case, all the ranks look like rank 0.
If I grab the flags with fcntl (o, O_GETFL)
before MPI_Init()
, they read 02 for both ranks.
So something inside MPI_Init()
appears to be doing something to that file descriptor.
Could it be in Cray PMI?
Testing the new release of Flux 0.68.0, I'm seeing a failure with MPI subprocesses that we didn't run into on Flux 0.67.0. The test is a simple MPI wrapper utility that will call
MPI_Init
, fork a subprocess, then wait for it to finish before callingMPI_Finalize
. On multi-node jobs, some ranks will fail with a "broken pipe" error. The launched subprocesses on those ranks seem to exit immediately due to closed stdin. Running the subprocess directly, or the wrapper without MPI enabled, does not have an issue.Output:
Eventlog:
Wrapper source: