Closed ewu63 closed 5 years ago
We created a context manager called multi_proc_exception_check that you can find in openmdao.utils.mpi that we use to deal with some situations where there is an exception raised in only some subset of the mpi procs. However, this solution only works in cases where the code wrapped by the context manager doesn't contain any collective MPI calls that could happen after an exeption was raised. In your example, it should work if you wrap the context manager around your call to self.my_assert.
Thanks for that, it seems to work for simpler cases for sure. I'm getting some MPI hangs in the main tests but that may be an unrelated issue, closing.
Does testflo support MPI testing where the assertion is only performed on one processor? Consider the following code:
which will hang since the other processor is not aware of the error and will instead wait at the barrier indefinitely.
I understand this is related to the broader question of error handling under
mpi4py
which is discussed here, but the proposed solution cannot be applied in this case. Some further discussions can be found here but I couldn't get it to work, and I'm not really sure how testflo works with MPI in general. Is there a way to handle this within testflo in a consistent manner?