ESCOMP / POP2-CESM

Parallel Ocean Program (POP2) in CESM
http://www.cesm.ucar.edu/models/cesm2/ocean/
4 stars 24 forks source link

CESM 2.1.z: Add derecho support, remove cheyenne support #77

Closed mnlevy1981 closed 10 months ago

mnlevy1981 commented 10 months ago

Description of changes:

Added PE layouts for derecho, removed PE layouts for cheyenne, and updated testlist to move cheyenne_intel tests to derecho_intel and remove all other cheyenne tests (intel is only supported compiler for 2.1.z). I also updated generate_pop_decomp.xml to include a 128 PE layout for the gx3v7 grid as well as some large PE layouts for tx0.1v3.

Testing:

I ran aux_pop and verified that all tests ran successfully. I also did some performance testing, and can report the following throughput in different configurations (from SMS_Ld20 and SMS_Ld20_D tests using the pop/performance_eval testmod directory)

Resolution C compset G compset C1850ECO compset G1850ECO compset
T62_g37 530.80 SYPD 359.77 SYPD 184.65 SYPD 196.13 SYPD
T62_g17 61.26 SYPD 61.42 SYPD 29.81 SYPD 29.70 SYPD
T62_g37 (DEBUG) 62.07 SYPD 66.60 SYPD 12.60 SYPD 12.90 SYPD
T62_g17 (DEBUG) 5.75 SYPD 5.79 SYPD 2.44 SYPD 2.44 SYPD

The only odd thing from this table is the drop from 530 SYPD -> 360 SYPD in the 3 degree grid when turning CICE on -- for some reason, WW3 is 4x slower in the G compset (despite running on the same number of cores); this is true at both resolutions and whether MARBL tracers are on or off, but POP is so fast on the 3 degree grid without MARBL that WW3 becomes the bottleneck in the G compset. I'll run a series of tests using a smaller number of cores for WW3 and see if it makes a difference before taking this PR out of draft.

Test status: bit-for-bit doesn't make sense in context of adding a new machine

User interface (namelist or namelist defaults) changes? None

mnlevy1981 commented 10 months ago

Dropping NTASKS_WAV from 128 to 32 didn't help close the performance gap between C and G on the gx3v7 grid, so I'm going to leave this as-is.

klindsay28 commented 10 months ago

Comparing your T62_g37 C and G tests, it looks like ATM_NCPL, and thus WAV_NCPL which is set to ATM_NCPL, is 4 in the C test and 24 in the G test. So the slowdown makes sense, because WW3 is doing more work and becomes the bottleneck.