FEniCS / performance-test

Mini App for FEniCSx performance testing
MIT License
15 stars 6 forks source link

MPI finalization error #66

Closed drew-parsons closed 3 years ago

drew-parsons commented 3 years ago

The FEniCS-X performance-test runs for Debian CI are giving an MPI finalization error. Tests themselves seem to be running fine, something seems to be going wrong during process shutdown. This is with OpenMPI 4.0.5 and PETSc 3.14.1.

Some test logs are collected at https://ci.debian.net/packages/f/fenicsx-performance-tests/ e.g. https://ci.debian.net/data/autopkgtest/unstable/amd64/f/fenicsx-performance-tests/8161116/log.gz

The error message is

mpirun has exited due to process rank 1 with PID 0 on
node ci-319-775e620e exiting improperly. There are three reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

3. this process called "MPI_Abort" or "orte_abort" and the mca parameter
orte_create_session_dirs is set to false. In this case, the run-time cannot
detect that the abort call was an abnormal termination. Hence, the only
error message you will receive is this one.

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).

You can avoid this message by specifying -quiet on the mpirun command line.
garth-wells commented 3 years ago

Fixed in #67.