geoschem / gchp_legacy

Repository for GEOS-Chem High Performance: software that enables running GEOS-Chem on a cubed-sphere grid with MPI parallelization.
http://wiki.geos-chem.org/GEOS-Chem_HP
Other
7 stars 13 forks source link

[BUG/ISSUE] Fullchem run failure in 12.7.0+ at c180+ due to reduced timesteps #62

Closed lizziel closed 4 years ago

lizziel commented 4 years ago

This issue is the same as https://github.com/geoschem/geos-chem/issues/219 for the GEOS-Chem repository. I traced that HEMCO issue to a GCHP issue that has gone unnoticed since versions prior to 12.7.0. An update that went into 12.7.0 brought the problem to light since it resulted in run crash.

The problem is that logical isChemTime is false in the warm GEOS-Chem restart phase at the beginning of the run if the default timesteps are reduced. This now happens automatically in GCHP starting at c180. isChemTime is used to determine what GEOS-Chem components are turned on for the run per timestep. If it is false then several components do not run, including emissions, and this causes a new update in 12.7.0 that uses emissions year from the HcoState%Clock object to fail.

This can be diagnosed by looking at the log for a c24 run, comparing a run with default timesteps with one with reduced timesteps. For c24 with default timesteps (20 min chem, 10 min dyn):

Doing warm GEOS-Chem restart
GEOS-Chem phase           -1 :
DoConv   :  T
DoDryDep :  T
DoEmis   :  T
DoTend   :  F
DoTurb   :  T
DoChem   :  T
DoWetDep :  T

However, at c24 with lowered timesteps (10 min chem, 5 min dyn) you get this instead. The ones that are false here but true above are all because of IsChemTime.

Doing warm GEOS-Chem restart
GEOS-Chem phase           -1 :
DoConv   :  T
DoDryDep :  F
DoEmis   :  F
DoTend   :  F
DoTurb   :  T
DoChem   :  F
DoWetDep :  T

IsChemTime is set using an ESMF alarm and here is the code (with out-of-date comments that need updating! The files referenced are for GEOS):

    ! Query the chemistry alarm.
    ! This checks if it's time to do chemistry, based on the time step
    ! set in AGCM.rc (GEOSCHEMCHEM_DT:). If the GEOS-Chem time step is not
    ! specified in AGCM.rc, the heartbeat will be taken (set in MAPL.rc).
    ! ----------------------------------------------------------------------
    CALL MAPL_Get(STATE, RUNALARM=ALARM, __RC__)
    IsChemTime = ESMF_AlarmIsRinging(ALARM, __RC__)

The timesteps in the run logs all look right, so unless there is a new timestep parameter in the config needed for new MAPL this is probably not the culprit:

GCHP.rc:

<   HEARTBEAT_DT: 300
---
>   HEARTBEAT_DT: 600
25,29c25,29
< SOLAR_DT: 300
< IRRAD_DT: 300
< RUN_DT:   300
< GIGCchem_DT: 600
< DYNAMICS_DT: 300
---
> SOLAR_DT: 600
> IRRAD_DT: 600
> RUN_DT:   600
> GIGCchem_DT: 1200
> DYNAMICS_DT: 600

CAP.rc:

< HEARTBEAT_DT:  300
---
> HEARTBEAT_DT:  600

I am going to continue to look at this. If anyone has ideas, please put your thoughts here.

lizziel commented 4 years ago

I also did runs of GEOS-Chem online in GEOS (old MAPL) with two different timesteps. Both had warm GEOS-Chem restart settings the same so no issue:

 Doing warm GEOS-Chem restart
 GEOS-Chem phase            1 :
 DoConv   :  T
 DoDryDep :  T
 DoEmis   :  T
 DoTend   :  T
 DoTurb   :  F
 DoChem   :  F
 DoWetDep :  F
lizziel commented 4 years ago

To further clarify the issue, here is a breakdown of current (12.7.1) behavior of GCHP in terms of what components are run during each timestep. For simplicity I only list chemistry and dynamics with the understanding that each represents a set of GEOS-Chem subcomponents.

Current (and correct) GCHP behavior with default low resolution timesteps (10/20):

warm restart: dynamics, chemistry
00:10: dynamics
00:20: dynamics, chemistry
00:30: dynamics
00:40: dynamics, chemistry
00:50: dynamics
etc.

Current (and incorrect) GCHP behavior with default high resolution timesteps (5/10):

warm restart: dynamics
00:05: dynamics, chemistry
00:10: dynamics
00:15: dynamics, chemistry
00:20: dynamics
00:25: dynamics, chemistry
etc.

This appears to be a MAPL alarms issue since IsChemTime is set from the alarm, alternating between timestep, and directly setting whether chemistry is done in addition to dynamics. Somehow the reduction of the timestep makes the alternating IsChemTime out-of-phase when it should be independent.

lizziel commented 4 years ago

I have verified this behavior is also present in GCHP versions prior to the MAPL upgrade in 12.5.

lizziel commented 4 years ago

This bug is fixed in commit https://github.com/geoschem/geos-chem/commit/8d2a8c20e7e034abfef990e83b2443380e60a582. I added updates to address the problem for the recommended timesteps only. These are autoset in runConfig.sh, and are 10/20 min (dyn/chem) for resolutions below c180, and 5/10 min above. It is a kludge and may not work with other timestep combinations.

The fix is made up of two parts:

  1. Update GIGCchem_REFERENCE_TIME seconds in GCHP.rc from 0 to 1 if using reduced timesteps. This prevents the shift of the alternating IsChemTime logical value returned by MAPL by one timestep.
  2. Force all components to be on during the first timestep.