Closed bentsherman closed 5 years ago
@4ctrl-alt-del Here's a fix I came up with:
void AbstractManager::finish()
{
// Add the debug header.
EDEBUG_FUNC(this);
// Call the finish interface for this manager's analytic.
_analytic->finish();
// Call all this object's output abstract data finish interfaces and then their
// finalize methods.
for (auto data: qAsConst(_outputData))
{
data->data()->finish();
data->finalize();
}
// Shutdown MPI instance to ensure that MPI_Finalize() is called.
QMPI::shutdown();
// Signal this manager is finished with execution and ready to be deleted.
emit finished();
}
That way no matter what runner is used, MPI_Finalize()
will always be called if it was initialized. This fixed my problem for the above test case and it doesn't cause problems if you remove mpirun
.
The way you designed the QMPI class made this really easy :)
Actually it may not be that simple, I was testing KINC with this fix and got this:
$ mpirun -np 5 kinc run similarity --input Yeast.emx --ccm Yeast.emx --cmx Yeast.cmx --clusmethod gmm --corrmethod spearman --preout --postout
1% 24m34s 1d16h32m6s
[...]
99% 15h13m37s 9m13s
-------------------------------------------------------
Primary job terminated normally, but 1 process returned
a non-zero exit code.. Per user-direction, the job has been aborted.
-------------------------------------------------------
*** The MPI_Send() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[(null):1474] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** The MPI_Iprobe() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[(null):1476] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** The MPI_Iprobe() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[(null):1478] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** The MPI_Iprobe() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[(null):1475] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** The MPI_Iprobe() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[(null):1477] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
So the QMPI::shutdown()
that I inserted seems to be premature in the case of many processes, I bet it's because of the termination signal that the master sends to all of the workers at the end.
Also this error happened only in the CPU mode (5 out of 5 failed) but not in the GPU mode (3 out of 3 succeeded), so I don't know if that has anything to do with it.
I must admit this is a very strange bug. Who would run MPI -np 1 ? Anyway I fixed it in commit 21bf42de7e29eb9916356d9f4c6d2d1719ebf730. The problem was ACE has no way to differentiate between a user doing mpirun -np 1 or running it without MPI. So mpirun -np 1 makes ACE do a Single analytic run which has no concept of safely shutting down MPI. I solved it by adding a safety "catch all" shutdown when everything is finished in EApplication.
I still have a similar issue as above, this time the error is slightly different:
*** The MPI_Comm_free() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[(null):5795] Local abort after MPI_FINALIZE completed successfully; not able to aggregate error messages, and not able to guarantee that all other processes were killed!
But the same general cause, ACE is still calling MPI functions somewhere after MPI is shut down. We need some way to verify that no MPI calls are pending before calling MPI_Finalize()
. I should note that this error happens for any number of processes, so the develop branch is kinda broken right now. :/
In the first case the culprit was MPI_Iprobe
, which is in QMPI::probe()
, which is called repeatedly by a timer. Maybe my initiial solution would work if we just cancel that timer event in QMPI::shutdown()
.
In the second case the culprit was MPI_Comm_free
, which is only called once to release the local comm object. I guess you could fix that by adding a barrier?
QMPI::~QMPI()
{
if ( !_failed )
{
MPI_Comm_free(&_local);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
}
}
I love it when a bug fix exposes another bug. Fixed QMPI shutdown bug in commit b84e71cf18c3ef7f64f6642394d4ee985081961c. Let me know if this works for you @bentsherman .
@bentsherman bump
I am closing this myself because the bug is fixed on my end and @bentsherman has not responded in over a week.
@4ctrl-alt-del sorry I never got back to you on this. I think I was waiting to confirm the fix worked (which it did) and then I got caught up in other things.
If I run an ACE application with
mpirun -np 1
, the program returns exit code 1 even if it finishes successfully:I know the
import
analytic doesn't actually use MPI, it's just easier to showcase. The same thing happens no matter what analytic you run. This is also a bit of a degenerate case because you could just runaceex
withoutmpirun
, and so in practice the user can just do that.The problem arises with scripts that run ACE applications automatically. For example, take my benchmarking pipeline:
https://github.com/bentsherman/benchmark-nf/blob/master/kinc/main.nf#L345
This script essentially runs KINC many times while varying the number of processes. So in the case of np = 1, KINC will run successfully but return exit code 1, which nextflow will interpret as a failure.
This is admittedly a minor issue. I might be able to write some bash logic to get around it for my nextflow pipeline, but I wanted to go ahead and document this problem. @4ctrl-alt-del do you think it would be easy to resolve this issue? I'm pretty sure the cause is that ACE uses the single runner instead of MPI runner if there's only 1 process, so
MPI_Init()
andMPI_Finalize()
are never called.