Closed ekluzek closed 11 months ago
I tried running the PEM test with 144x1 and it still fails on the 72 pe test with the following:
Domain Information
Horizontal domain: nx = 144 ny = 96 No. of categories: nc = 1 No. of ice layers: ni = 8 No. of snow layers:ns = 3 Processors: total = 72 Processor shape: square-ice Distribution type: roundrobin Distribution weight: latitude ew_boundary_type: cyclic ns_boundary_type: open maskhalo_dyn = T maskhalo_remap = T maskhalo_bound = T max_blocks = 4 Number of ghost cells: 1
create_distrb_roundrobin: max_blocks too small
A successful 72 pe count test (SMS) shows:
Domain Information
Horizontal domain: nx = 144 ny = 96 No. of categories: nc = 1 No. of ice layers: ni = 8 No. of snow layers:ns = 3 Processors: total = 72 Processor shape: square-ice Distribution type: roundrobin Distribution weight: latitude ew_boundary_type: cyclic ns_boundary_type: open maskhalo_dyn = T maskhalo_remap = T maskhalo_bound = T max_blocks = 6 Number of ghost cells: 1
Part of the difference are the settings of CICE variables:
Failing PEM case:
CICE_BLCKX: 4
CICE_BLCKY: 6
CICE_CONFIG_OPTS: -phys cice5
CICE_CPPDEFS: -Dncdf -DNUMIN=11 -DNUMAX=99 -DNICECAT=1 -DNXGLOB=144 -DNYGLOB=96 -DNTRAERO=0 -DNTRISO=0 -DNBGCLYR=0 -DNICELYR=8 -DNSNWLYR=3 -DTRAGE=1 -DTRFY=1 -DTRLVL=1 -DTRPND=1 -DTRBRI=0 -DTRBGCS=0 -DBLCKX=4 -DBLCKY=6 -DMXBLCKS=4
CICE_DECOMPSETTING: square-ice
CICE_DECOMPTYPE: roundrobin
CICE_MODE: prescribed
CICE_MXBLCKS: 4
Working case:
(ctsm_pylib) cheyenne3 case2/PEM_Vmct_P144x1.f19_f19_mg16.FLT2000ClimoC6I5Slim.cheyenne_intel.slim-default.GC.slim42_cesm2_3_beta15chlist> ./xmlquery -p CICE
Results in group build_component_cice
CICE_AUTO_DECOMP: TRUE
CICE_BLCKX: 6
CICE_BLCKY: 6
CICE_CONFIG_OPTS: -phys cice5
CICE_CPPDEFS: -Dncdf -DNUMIN=11 -DNUMAX=99 -DNICECAT=1 -DNXGLOB=144 -DNYGLOB=96 -DNTRAERO=0 -DNTRISO=0 -DNBGCLYR=0 -DNICELYR=8 -DNSNWLYR=3 -DTRAGE=1 -DTRFY=1 -DTRLVL=1 -DTRPND=1 -DTRBRI=0 -DTRBGCS=0 -DBLCKX=6 -DBLCKY=6 -DMXBLCKS=6
CICE_DECOMPSETTING: square-ice
CICE_DECOMPTYPE: roundrobin
CICE_MODE: prescribed
CICE_MXBLCKS: 6
Probably the best way around this is to give up doing an F compset test for this and just do an I compset. The point of the PEM test is to make sure SLIM answers don't change with processor count, rather than ensuring CAM, CICE, and DOM don't change answers (that determination should be done separate than this). So I'll just move the test to an I compset.
In the update to cesm2_3_beta15 a PEM test that ran on 36 processors failed because the number of processors was too low for CICE5. I didn't see this in cesm2_3_beta10, but we do with the update. The workaround is just to give it more processors.
The test that fails is:
PEM_Vmct.f19_f19_mg16.FLT2000ClimoC6I5Slim.cheyenne_intel.slim-default
The error message and traceback in the cesm.log file looks like:
29: ERROR: create_distrb_roundrobin: max_blocks too small