esmf-org / esmf

The Earth System Modeling Framework (ESMF) is a suite of software tools for developing high-performance, multi-component Earth science modeling applications.
https://earthsystemmodeling.org/
Other
149 stars 70 forks source link

Remove the VMEpoch and other blocking communication limitation #245

Open oehmke opened 2 months ago

oehmke commented 2 months ago

Right now you shouldn't use blocking communication calls within a VMEpoch region. Here is what Gerhard says:

I took a look at the ESMF_InfoBroadcast() implementation, and right now it uses just a simple blocking collective MPI_Bcast() under the hood for the mpionly case (which I am sure this is). It is therefore not currently safe to use this call within an active VMEpoch! It will hang.

The underlying reason for this is that within a VMEpoch, all non-blocking send & recv calls are intercepted. On the send side all messages to the same dst are aggregated, and not send until the VMEpoch is exited. However, on the recv side, the first non-blocking recv is going to block probing for the incoming message to determine its size (since it was aggregated on the src side and unknown on the dst side). Not until the msg was received will a receiving PET continue and process any of the other receives (potentially many in the loop over routehandle based SMM() and GridRedist() in this example). Anyway, because of this behavior a blocking collective call like the MPI_Bcast() used by ESMF_InfoBroadcast() inside the VMEpoch will deadlock!

It would probably be pretty straight forward to extend the MPI_Bcast() to check whether it is being called from within an active VMEpoch, and if so use non-blocking calls instead. This would make Dusan's code safe, and probably also a bit more efficient. Could be something for 8.8? In fact we could go through all our VM collectives and make them VMEpoch-safe. Not sure about practical importance of it though. After all, VMEpoch is meant to optimize very specific communication patterns, and that means typically around a tight loop of SMM() calls or such.

The idea of this issue is to remove this limitation.