ESCOMP / SimpleLand

Simple Land Model for CESM --- *** IN DEVELOPMENT *** --- please contact for more info. See supplemental information of https://journals.ametsoc.org/doi/abs/10.1175/JCLI-D-18-0812.1 for a description of SLIM physics. Implementation of SLIM into the main CESM trunk is ongoing. SLIM currently works with the CESM2.1 release, but must be downloaded from this repository until we finish implementing it properly into the main CESM code.
Other
12 stars 7 forks source link

36 processors too few for CICE #89

Closed ekluzek closed 11 months ago

ekluzek commented 11 months ago

In the update to cesm2_3_beta15 a PEM test that ran on 36 processors failed because the number of processors was too low for CICE5. I didn't see this in cesm2_3_beta10, but we do with the update. The workaround is just to give it more processors.

The test that fails is:

PEM_Vmct.f19_f19_mg16.FLT2000ClimoC6I5Slim.cheyenne_intel.slim-default

The error message and traceback in the cesm.log file looks like:

29: ERROR: create_distrb_roundrobin: max_blocks too small

31: create_distrb_roundrobin: max_blocks too small
31: ERROR: create_distrb_roundrobin: max_blocks too small
32: create_distrb_roundrobin: max_blocks too small
32: ERROR: create_distrb_roundrobin: max_blocks too small
33: create_distrb_roundrobin: max_blocks too small
33: ERROR: create_distrb_roundrobin: max_blocks too small
7:Image              PC                Routine            Line        Source
7:cesm.exe           000000000263D876  Unknown               Unknown  Unknown
7:cesm.exe           0000000001DE8360  shr_abort_mod_mp_         114  shr_abort_mod.F90
7:cesm.exe           00000000019FBA04  ice_exit_mp_abort          46  ice_exit.F90
7:cesm.exe           0000000001BA8C08  ice_distribution_        1025  ice_distribution.F90
7:cesm.exe           00000000019F967F  ice_domain_mp_ini         438  ice_domain.F90
7:cesm.exe           0000000001A20E95  ice_grid_mp_init_         209  ice_grid.F90
7:cesm.exe           0000000001B30E9C  cice_initmod_mp_c          96  CICE_InitMod.F90
7:cesm.exe           00000000019F4A0A  ice_comp_mct_mp_i         262  ice_comp_mct.F90
7:cesm.exe           000000000042F914  component_mod_mp_         257  component_mod.F90
7:cesm.exe           000000000041DD66  cime_comp_mod_mp_        1439  cime_comp_mod.F90
7:cesm.exe           000000000042C815  MAIN__                    122  cime_driver.F90
ekluzek commented 11 months ago

I tried running the PEM test with 144x1 and it still fails on the 72 pe test with the following:

Domain Information

Horizontal domain: nx = 144 ny = 96 No. of categories: nc = 1 No. of ice layers: ni = 8 No. of snow layers:ns = 3 Processors: total = 72 Processor shape: square-ice Distribution type: roundrobin Distribution weight: latitude ew_boundary_type: cyclic ns_boundary_type: open maskhalo_dyn = T maskhalo_remap = T maskhalo_bound = T max_blocks = 4 Number of ghost cells: 1

create_distrb_roundrobin: max_blocks too small

A successful 72 pe count test (SMS) shows:

Domain Information

Horizontal domain: nx = 144 ny = 96 No. of categories: nc = 1 No. of ice layers: ni = 8 No. of snow layers:ns = 3 Processors: total = 72 Processor shape: square-ice Distribution type: roundrobin Distribution weight: latitude ew_boundary_type: cyclic ns_boundary_type: open maskhalo_dyn = T maskhalo_remap = T maskhalo_bound = T max_blocks = 6 Number of ghost cells: 1

ekluzek commented 11 months ago

Part of the difference are the settings of CICE variables:

Failing PEM case:

        CICE_BLCKX: 4
        CICE_BLCKY: 6
        CICE_CONFIG_OPTS:  -phys cice5
        CICE_CPPDEFS:  -Dncdf -DNUMIN=11 -DNUMAX=99  -DNICECAT=1 -DNXGLOB=144 -DNYGLOB=96 -DNTRAERO=0 -DNTRISO=0 -DNBGCLYR=0 -DNICELYR=8 -DNSNWLYR=3 -DTRAGE=1 -DTRFY=1 -DTRLVL=1 -DTRPND=1 -DTRBRI=0 -DTRBGCS=0 -DBLCKX=4 -DBLCKY=6 -DMXBLCKS=4
        CICE_DECOMPSETTING: square-ice
        CICE_DECOMPTYPE: roundrobin
        CICE_MODE: prescribed
        CICE_MXBLCKS: 4

Working case:

(ctsm_pylib) cheyenne3 case2/PEM_Vmct_P144x1.f19_f19_mg16.FLT2000ClimoC6I5Slim.cheyenne_intel.slim-default.GC.slim42_cesm2_3_beta15chlist> ./xmlquery -p CICE                                  
Results in group build_component_cice
        CICE_AUTO_DECOMP: TRUE
        CICE_BLCKX: 6
        CICE_BLCKY: 6
        CICE_CONFIG_OPTS:  -phys cice5
        CICE_CPPDEFS:  -Dncdf -DNUMIN=11 -DNUMAX=99  -DNICECAT=1 -DNXGLOB=144 -DNYGLOB=96 -DNTRAERO=0 -DNTRISO=0 -DNBGCLYR=0 -DNICELYR=8 -DNSNWLYR=3 -DTRAGE=1 -DTRFY=1 -DTRLVL=1 -DTRPND=1 -DTRBRI=0 -DTRBGCS=0 -DBLCKX=6 -DBLCKY=6 -DMXBLCKS=6
        CICE_DECOMPSETTING: square-ice
        CICE_DECOMPTYPE: roundrobin
        CICE_MODE: prescribed
        CICE_MXBLCKS: 6
ekluzek commented 11 months ago

Probably the best way around this is to give up doing an F compset test for this and just do an I compset. The point of the PEM test is to make sure SLIM answers don't change with processor count, rather than ensuring CAM, CICE, and DOM don't change answers (that determination should be done separate than this). So I'll just move the test to an I compset.