GEOS-ESM / GOCART

GOCART Aerosol model including process library and framework interfaces (MAPL, NUOPC, and CCPP)
Apache License 2.0
14 stars 14 forks source link

0-increment Replays are not 0-diff #140

Closed sdrabenh closed 2 years ago

sdrabenh commented 2 years ago

@lltakacs and @sdrabenh confirmed 0-increment replays fail to be 0-diff if REPLAY_FILE_FREQUENCY is not the default 21600. This is important since OPS has moved to a 2-hourly 4D IAU. Tests using gcm v10.22.0 and after have the bug while v10.21.1 does not. The implication is that something is amiss in how GOCART-2G interacts with the IAU machinery.

In our C48 out-of-the-box case, a 6-hour AMIP was run as a control. Then, a 0-increment replay was run using the following:

    REPLAY_ANA_EXPID:    x0046a
    REPLAY_ANA_LOCATION: /discover/nobackup/projects/gmao/dadev/dao_it/archive/x0046a
    REPLAY_MODE:         Regular
    REPLAY_FILE:         ana/Y%y4/M%m2/x0046a.ana.eta.%y4%m2%d2_%h200z.nc4
    REPLAY_FILE_FREQUENCY:      21600
    REPLAY_FILE_REFERENCE_TIME: 000000
    REPLAY_P:  NO
    REPLAY_U:  NO
    REPLAY_V:  NO
    REPLAY_T:  NO
    REPLAY_QV: NO
    REPLAY_O3: NO
    REPLAY_TS: NO

The results from using the above configuration have 0-diff restarts compared to the AMIP. This is expected. However, if the following is changed ...

REPLAY_FILE_FREQUENCY: 7200

... 3 restarts become non-0-diff after the first time step. Specifically, there are the following differences:

cdo diffn scratch.amip/achem_internal_checkpoint scratch.0inc-x46a/achem_internal_checkpoint
               Date     Time   Level Gridsize    Miss    Diff : S Z  Max_Absdiff Max_Reldiff : Parameter name
    72 : 2015-05-09 21:07:30      72    13824       0    6484 : F T   3.2613e-16  3.6506e-05 : VOC        
   144 : 2015-05-09 21:07:30      72    13824       0     477 : F T   4.5103e-17  1.6128e-05 : VOCbiob    
  2 of 144 records differ
  0 of 144 records differ more than 0.001
cdo    diffn: Processed 3981312 values from 4 variables over 2 timesteps [0.12s 18MB].

cdo diffn scratch.amip/cabr_internal_checkpoint scratch.0inc-x46a/cabr_internal_checkpoint
               Date     Time   Level Gridsize    Miss    Diff : S Z  Max_Absdiff Max_Reldiff : Parameter name
    72 : 2015-05-09 21:07:30      72    13824       0     418 : F T   2.2204e-16  1.2083e-05 : CAphilicCA.br
  1 of 144 records differ
  0 of 144 records differ more than 0.001
cdo    diffn: Processed 3981312 values from 4 variables over 2 timesteps [0.10s 18MB].

cdo diffn scratch.amip/caoc_internal_checkpoint scratch.0inc-x46a/caoc_internal_checkpoint
               Date     Time   Level Gridsize    Miss    Diff : S Z  Max_Absdiff Max_Reldiff : Parameter name
    72 : 2015-05-09 21:07:30      72    13824       0    6240 : F T   1.7850e-15  0.00031656 : CAphilicCA.oc
  1 of 144 records differ
  0 of 144 records differ more than 0.001
cdo    diffn: Processed 3981312 values from 4 variables over 2 timesteps [0.12s 18MB].

At the end of a 6-hour window, all restarts become non-0-diff. This should not be the case.

sdrabenh commented 2 years ago

Thanks @bena-nasa @pcolarco @mathomp4 @lltakacs for isolating the problem, tracking down, and issuing a fix!