esmf-org / esmf

The Earth System Modeling Framework (ESMF) is a suite of software tools for developing high-performance, multi-component Earth science modeling applications.
https://earthsystemmodeling.org/
Other
153 stars 74 forks source link

Avoid All-to-All communications in ESMF_StateReconcile() #88

Open theurich opened 1 year ago

theurich commented 1 year ago

Currently the ESMF_StateReconcile() implementation uses various forms of MPI_Alltoall communications. This has been an issue over the years. (Notice that the SMMStore() and SMM() implementations, the basis of all ESMF_RouteHandle comms, have always avoided all-to-all communications!)

Recently again the issue has come up with HPE/Cray Slingshot network fabric as well as with InteMPI on Linux clusters. It was observed on multiple HPE Cray EX systems (Narwhal, WCOSS2), that with the default OFI network module loaded, MPI_Alltoall calls scale badly to high task counts, both in performance and memory usage. Switching to the alternative UCX fabric has been the current work-around. However, the Cray roadmap does not include UCX, and soon OFI may be the only option offered. Under IntelMPI (on Hera) a similar issue was observed which was not resolved until version 2021.3.0!

On a UFS call with EMC and GFDL participants Rusty Benson (GFDL) pointed out that they always avoid All-to-All communications because "there have always been issues with performance".

The ESMF_StateReconcile() call is so fundamental to ESMF/NUOPC that we need to revisit the communication strategy for increasing PET counts!

Additional issues reported: https://github.com/esmf-org/esmf-support/issues/191

anntsay commented 7 months ago

When this is fixed, notify Brian per https://github.com/esmf-org/esmf-support/issues/400 to see if this fix may fixed their large number of cores issues. Note that gut feeling wise, this may not completely fixed the issues in esmf-support/#400.