Closed rupertnash closed 5 years ago
I think the second approach of aborting with a helpful error is probably the safest option. The situation doesn't make much sense and it is better a non-MPI savvy user is informed of this than it appearing to work but the logic being wrong. There are a few places in the code where this sort of thing crops up for specific use-cases...
As MUI progresses, decisions like this will need to be unified into en ethos, documentation made, CI added to the repository etc. etc.
Cheers Stephen. I'll do that and send a PR. I had a very confused 30mins... Debugging MPMD runs is often non-trivial...
Agreed! If you aren't already, you can make it a fraction easier by tunnelling each rank's console output to individual files using the --output-filename flag with mpirun (i.e. mpirun --output -filename Logs/outputfiles -np 2 solver1 : -np 4 solver 2)
In
lib_mpi_split.h
, the functionmpi_split_by_app
will crash with an unhelpful error (SIGSEGV) if not run in MPMD mode. I implemented a version (see my branch https://github.com/rupertnash/MUI/tree/fix-MPI_APPNUM-missing) that will returnMPI_COMM_WORLD
in the case of being run as a SPMD program.I'm not sure if this is in fact the correct approach. It may be better to abort with a sensible message to the user. E.g.:
Can you please let me know what fits better with the "MUI philosophy"! Cheers!