GRTLCollaboration / GRChombo

An AMR based open-source code for numerical relativity simulations.
BSD 3-Clause "New" or "Revised" License
82 stars 53 forks source link

Deadlocks when using non-blocking collectives with OpenMPI #208

Closed mirenradia closed 2 years ago

mirenradia commented 2 years ago

Several users (including myself) have sometimes encountered deadlocks when using OpenMPI that seems to stem from the non-blocking MPI collectives in the AMRInterpolator and is resolved by the changes in this commit. However the issue does not always occur and there may be other factors at play.

In my experience the deadlock doesn't seem to occur straight away but rather at the next MPI collective call after the first MPI_Waitall in MPIContext::asyncEnd() whether that be in writing an HDF5 file or the next use of the AMRInterpolator.

I have experienced this problem with OpenMPI 4.0.5.