CESM 2.1.z: Add derecho support, remove cheyenne support

mnlevy1981 commented 10 months ago

Description of changes:

Added PE layouts for derecho, removed PE layouts for cheyenne, and updated testlist to move cheyenne_intel tests to derecho_intel and remove all other cheyenne tests (intel is only supported compiler for 2.1.z). I also updated generate_pop_decomp.xml to include a 128 PE layout for the gx3v7 grid as well as some large PE layouts for tx0.1v3.

Testing:

I ran aux_pop and verified that all tests ran successfully. I also did some performance testing, and can report the following throughput in different configurations (from SMS_Ld20 and SMS_Ld20_D tests using the pop/performance_eval testmod directory)

Resolution	C compset	G compset	C1850ECO compset	G1850ECO compset
T62_g37	530.80 SYPD	359.77 SYPD	184.65 SYPD	196.13 SYPD
T62_g17	61.26 SYPD	61.42 SYPD	29.81 SYPD	29.70 SYPD
T62_g37 (DEBUG)	62.07 SYPD	66.60 SYPD	12.60 SYPD	12.90 SYPD
T62_g17 (DEBUG)	5.75 SYPD	5.79 SYPD	2.44 SYPD	2.44 SYPD

The only odd thing from this table is the drop from 530 SYPD -> 360 SYPD in the 3 degree grid when turning CICE on -- for some reason, WW3 is 4x slower in the G compset (despite running on the same number of cores); this is true at both resolutions and whether MARBL tracers are on or off, but POP is so fast on the 3 degree grid without MARBL that WW3 becomes the bottleneck in the G compset. I'll run a series of tests using a smaller number of cores for WW3 and see if it makes a difference before taking this PR out of draft.

Test status: bit-for-bit doesn't make sense in context of adding a new machine

User interface (namelist or namelist defaults) changes? None

mnlevy1981 commented 10 months ago

Dropping NTASKS_WAV from 128 to 32 didn't help close the performance gap between C and G on the gx3v7 grid, so I'm going to leave this as-is.

klindsay28 commented 10 months ago

Comparing your T62_g37 C and G tests, it looks like ATM_NCPL, and thus WAV_NCPL which is set to ATM_NCPL, is 4 in the C test and 24 in the G test. So the slowdown makes sense, because WW3 is doing more work and becomes the bottleneck.

ESCOMP / POP2-CESM

CESM 2.1.z: Add derecho support, remove cheyenne support #77

Description of changes:

Testing: