ESCOMP / CMEPS

NUOPC Community Mediator for Earth Prediction Systems
https://escomp.github.io/CMEPS/
24 stars 79 forks source link

possible issue in time units of coupler history file/s for restart #366

Closed uturuncoglu closed 1 year ago

uturuncoglu commented 1 year ago

I was trying to test restart capability under UFS with datm+land configuration. The test tries to restart the model after 6 hours of simulation and run another 6 hours to compare the results with baseline. The results are bit-to-bit identical with baseline which is good but baseline check is failing because of time axis.

The ufs.cpld.cpl.hi.2011-01-01-43200.nc file for baseline run has time:units = "days since 2011-01-01 00:00:00" ; (the actual value in time variable is 0.5) attribute but the restart run has time:units = "days since 2011-01-01 06:00:00" ; (the actual value in time variable is 0.25). So even if both are same date, cprnc and also UFS RT tool fails when it checks these files since time variables (and also time_bnds) are not same.

 time   (time)  t_index =      1     1
          1        1  (     1) (     1) (     1) (     1)
                   1   5.000000000000000E-01   5.000000000000000E-01 2.5E-01  5.000000000000000E-01 5.0E-01  5.000000000000000E-01
                   1   2.500000000000000E-01   2.500000000000000E-01          2.500000000000000E-01          2.500000000000000E-01
                   1  (     1) (     1)
          avg abs field values:    5.000000000000000E-01    rms diff: 2.5E-01   avg rel diff(npos):  5.0E-01
                                   2.500000000000000E-01                        avg decimal digits(ndif):  0.3 worst:  0.3
 RMS time                             2.5000E-01            NORMALIZED  6.6667E-01

 time_bnds   (ntb,time)  t_index =      1     1
          2        2  (     1,     1) (     1,     1) (     1,     1) (     1,     1)
                   2   5.000000000000000E-01   5.000000000000000E-01 2.5E-01  5.000000000000000E-01 5.0E-01  5.000000000000000E-01
                   2   2.500000000000000E-01   2.500000000000000E-01          2.500000000000000E-01          2.500000000000000E-01
                   2  (     1,     1) (     1,     1)
          avg abs field values:    5.000000000000000E-01    rms diff: 2.5E-01   avg rel diff(npos):  5.0E-01
                                   2.500000000000000E-01                        avg decimal digits(ndif):  0.3 worst:  0.3
 RMS time_bnds                        2.5000E-01            NORMALIZED  6.6667E-01

Anyway, it seems that runs are not using same epic for time units and this cause an issue. The only restart test that checks the .cpl.hi. file is mine and this explains why is it not cached before under UFS. I wonder how this is handled for CESM. Maybe tests are not checking .cpl.hi. for the restart runs but not sure. I could easily fix this by removing file from the list and not to check but i think that restart run need to use same epoc with the initial run.

Any idea or suggestion @jedwards4b @mvertens? Do you think that this is bug? If so, maybe I could try to fix it.

jedwards4b commented 1 year ago

I tried this with ERS_Lh12.f19_g17.A.cheyenne_intel and both time_bnds and time:units are the same in initial and restart: ERS_Lh12.f19_g17.A.cheyenne_intel.20230422_073627_mjuxj4.cpl.hi.0001-01-01-43200.nc Is it possible that the UFS ModelClock is treated different than CESM?

DeniseWorthen commented 1 year ago

Yes, I think this might be the issue. Our driver uses a fhrot value to give the time relative to the initial start time for forecasts:

!-----------------------------------------------------------------------
!***  Adjust the currTime of the main clock: CLOCK_MAIN
!***  if the fhrot is > 0
!***  This will correctly set the UFS Driver clocks in case of
!***  Restart-From-History.
!-----------------------------------------------------------------------
      CALL ESMF_ConfigGetAttribute(config   = CF_MAIN          &
                                   ,value   = fhrot            &
                                   ,label   = 'fhrot:'         &
                                   ,default = 0.0_ESMF_KIND_R8 &
                                   ,rc      = RC)
      ESMF_ERR_ABORT(RC)

      if (fhrot > 0) then
        CALL ESMF_TimeIntervalSet(restartOffset, h_r8=fhrot, rc=RC)
        ESMF_ERR_ABORT(RC)
        CURRTIME = STARTTIME + restartOffset
        call ESMF_ClockSet(CLOCK_MAIN, currTime=CURRTIME, &
                                       timeStep=(TIMESTEP-restartOffset), &
                                       rc=RC)
        ESMF_ERR_ABORT(RC)
      endif

For UFS, we don't compare any history output files of any component other than the ATM.

uturuncoglu commented 1 year ago

@jedwards4b @DeniseWorthen Thanks. It think this is not related with CMEPS but UFS Weather Model. I could create issue over there to track. I think this is not an urgent issue and also involves other components and might need to create baseline for UFS from scratch. I have no idea how this could affect existing DA system. I am closing this for now since I opened a issue in UFS Weather Model side (https://github.com/ufs-community/ufs-weather-model/issues/1721).