poor performance on slingshot11

MCSclimate / MCT

Model Coupling Tookit

Other

43 stars 18 forks source link

poor performance on slingshot11 #72

Open jedwards4b opened 1 year ago

jedwards4b commented 1 year ago

Using the cesm model in a coupler test configuration PFS.ne120_t12.2000_XATM_XLND_XICE_XOCN_XROF_SGLC_SWAV.derecho_intel We are observing very poor performance of mct_rearrange_rearr on machines perlmutter (NERSC) and derecho (NCAR) - both machines use slingshot11 network and AMD processor.
Using 512 tasks on derecho with gptl timing we see "mct_rearrange_rearr" - 512 512 4.426752e+06 1.391128e+05 277.198 ( 268 0) 263.345 ( 505 0)

Comparing to the ncar cheyenne system: "mct_rearrange_rearr" - 512 512 4.426752e+06 3.399975e+04 73.911 ( 414 0) 60.767 ( 384 0)

rljacob commented 1 year ago

Noting that a similar performance difference is seen between perlmutter and chrysalis (an AMD machine with infiniband) for E3SM cases. (Haven't tried the exact case above yet).

jedwards4b commented 1 year ago

I just tried the X case on derecho with the cray compiler and I am not seeing the poor performance - rearrange_rearr max 46.8 min 40.4 (cray compiler 15.0.1) max 642.257 min 445.713 (intel compiler 2023.0.0)

rljacob commented 1 year ago

Is the mpi library different?

jedwards4b commented 1 year ago

It's the same mpi library, cray-mpich/8.1.25, however I note that there is a different build of this library for each compiler flavor.

rljacob commented 1 year ago

Updating this issue: some hardware updates on NERSC made a lot of the observed behavior go away. @ndkeen can say more.

ndkeen commented 1 year ago

On Sep28th maintenance, there were some updates (BIOS, network, SW). And indeed I see improvements in several places -- mostly in communication at higher node counts on pm-cpu. cpl_720

Where, in the plot, c1 refers to normal/default PSTRID of 1, and c8 is the work-around we were using of CPL_PSTRID=8.