JuliaParallel / MPI.jl

MPI wrappers for Julia
https://juliaparallel.org/MPI.jl/
The Unlicense
381 stars 122 forks source link

[mpiexecjl] Return exit code of the mpiexec process #834

Closed giordano closed 5 months ago

giordano commented 5 months ago

Should fix #833. I don't have the time to write tests now though, so opening as draft. @cohensbw could you please test this? With this PR I get

% mpiexecjl -np 2 --project julia --color=yes -e 'exit(2)'; echo $?
2
cohensbw commented 5 months ago

A quick test does some to indicate that this fixes the issue. I am now able to recover exit codes other than 1 when passed to exit().

giordano commented 5 months ago

@eschnett could you please have a look at the mpitrampoline errors?

Failed to precompile MPI [da04e1cc-30fd-572f-bb4f-1f8673147195] to "/home/runner/.julia/compiled/v1.10/MPI/jl_FfNYtV".
MPItrampoline: MPI ABI version mismatch:
This version of MPItrampoline requires MPI ABI version 2.10.0, but the loaded MPIwrapper only provides MPI ABI version 2.9.0.
This is MPItrampoline version 5.4.0.
You loaded MPIwrapper version 2.10.3 from file "/usr/local/lib/libmpiwrapper.so"

I presume we need to update something in the CI setup (unrelated to this PR), but the error message looks a bit contradictory, before it says we have mpiwrapper 2.9, and then it says it's 2.10

eschnett commented 5 months ago

The error is

MPItrampoline: MPI ABI version mismatch:
This version of MPItrampoline requires MPI ABI version 2.10.0, but the loaded MPIwrapper only provides MPI ABI version 2.9.0.
This is MPItrampoline version 5.4.0.
You loaded MPIwrapper version 2.10.3 from file "/usr/local/lib/libmpiwrapper.so".

We need to use MPIwrapper 2.11 instead.

I think I got the semver semantics wrong. The recent change to MPItrampoline (supporting oneAPI) was supposed to be backward compatible, hence the minor version change only. Sorry about this!

giordano commented 5 months ago

Now we get:

% mpiexecjl -np 2 --project=/tmp julia --color=yes -e 'exit(2)'; echo $?                
┌ Error: The MPI process failed
│   proc = Process(setenv(`/home/mose/.julia/artifacts/b7a943fb6a811908b073b8af69d955f16703ca2b/bin/mpiexec -np 2 julia --color=yes -e 'exit(2)'`,[...]), ProcessExited(2))
└ @ Main none:7
2

which, similarly to what we were doing previously, prints to screen the failed process.

eschnett commented 5 months ago

@giordano I assume your comment above isn't meant for me any more, and that MPItrampoline is now working correctly.

giordano commented 5 months ago

Yes, I was back to the topic of this PR 🙂