Closed mnlevy1981 closed 10 months ago
Dropping NTASKS_WAV
from 128 to 32 didn't help close the performance gap between C
and G
on the gx3v7
grid, so I'm going to leave this as-is.
Comparing your T62_g37
C
and G
tests, it looks like ATM_NCPL
, and thus WAV_NCPL
which is set to ATM_NCPL
, is 4 in the C
test and 24 in the G
test. So the slowdown makes sense, because WW3 is doing more work and becomes the bottleneck.
Description of changes:
Added PE layouts for derecho, removed PE layouts for cheyenne, and updated testlist to move
cheyenne_intel
tests toderecho_intel
and remove all othercheyenne
tests (intel
is only supported compiler for 2.1.z). I also updatedgenerate_pop_decomp.xml
to include a 128 PE layout for thegx3v7
grid as well as some large PE layouts fortx0.1v3
.Testing:
I ran
aux_pop
and verified that all tests ran successfully. I also did some performance testing, and can report the following throughput in different configurations (fromSMS_Ld20
andSMS_Ld20_D
tests using thepop/performance_eval
testmod directory)The only odd thing from this table is the drop from 530 SYPD -> 360 SYPD in the 3 degree grid when turning CICE on -- for some reason, WW3 is 4x slower in the G compset (despite running on the same number of cores); this is true at both resolutions and whether MARBL tracers are on or off, but POP is so fast on the 3 degree grid without MARBL that WW3 becomes the bottleneck in the G compset. I'll run a series of tests using a smaller number of cores for WW3 and see if it makes a difference before taking this PR out of draft.
Test status: bit-for-bit doesn't make sense in context of adding a new machine
User interface (namelist or namelist defaults) changes? None