Investigate changing MOM thermodynamic time step (`DT_THERM`)

dougiesquire commented 5 months ago

We need to test the effect of changing DT_THERM on ACCESS-OM3 performance and physical fields.

Once https://github.com/COSIMA/MOM6-CICE6/pull/48 is merged and https://github.com/COSIMA/access-om3/issues/137 is closed, we'll use the MOM6-CICE6/025deg_jra55do_ryf configuration as a baseline for runs with longer DT_THERM:

[ ] DT_THERM = 2700.0
[ ] DT_THERM = 5400.0
[ ] DT_THERM = 8100.0
[ ] DT_THERM = 10800.0

To run with these, the following parameters will also need to be changed:

THERMO_SPANS_COUPLING = True
SINGLE_STEPPING_CALL = False

@aekiss, @AndyHoggANU do you have any suggestions for what should be looked at in the physical fields?

adele-morrison commented 5 months ago

@aekiss, @AndyHoggANU do you have any suggestions for what should be looked at in the physical fields?

Time series of global average temperature and salinity,
Zonal average temperature and salinity (i.e. depth/latitude maps)
Zonally integrated overturning in density / latitude space, or time series of max/min overturning at particular latitudes.
Time series of Drake Passage zonal transport.

dougiesquire commented 5 months ago

Thanks @adele-morrison!

aekiss commented 5 months ago

See comments here: https://github.com/COSIMA/MOM6-CICE6/pull/48#discussion_r1555124959

minghangli-uni commented 3 months ago

This comment covers parameters related to THERMO_SPANS_COUPLING within the module MOM of the MOM input parameters.

In MOM6, tracer advection is stepped with the thermodynamic timestep, which can be much longer than the coupling timestep. This can be achieved by enabling THERMO_SPANS_COUPLING. In the following setup, it is set to 8100s, which is 6 times longer than the coupling timestep of 1350s. Similar tracer timesteps can be found in GFDL OM4 0.25deg, and GFDL OM5 0.25deg.

THERMO_SPANS_COUPLING = True     !   [Boolean] default = False
                                 ! If true, the MOM will take thermodynamic and tracer timesteps that can be
                                 ! longer than the coupling timestep. The actual thermodynamic timestep that is
                                 ! used in this case is the largest integer multiple of the coupling timestep
DT_THERM = 8100.0                !   [s] default = 1350.0
                                 ! The thermodynamic and tracer advection time step. Ideally DT_THERM should be
                                 ! an integer multiple of DT and less than the forcing or coupling time-step,
                                 ! unless THERMO_SPANS_COUPLING is true, in which case DT_THERM can be an integer
                                 ! multiple of the coupling timestep.  By default DT_THERM is set to DT.
DTBT_RESET_PERIOD = 8100.0       !   [s] default = 1350.0 - (DT_THERM)
                                 ! The period between recalculations of DTBT (if DTBT <= 0). If DTBT_RESET_PERIOD
                                 ! is negative, DTBT is set based only on information available at
                                 ! initialization.  If 0, DTBT will be set every dynamics time step. The default
                                 ! is set by DT_THERM.  This is only used if SPLIT is true.

A preliminary test compared two cases for a 10-day run using 1440 cpu cores with a PE layout of #ocn: 1344, #ice: 96, #cpl: 96, #atm: 48 and #rof: 48.

Case	dt_dyn	dt_therm_ice	dt_cpl	dt_therm	Run duration (ocn)
THERMO_SPANS_COUPLING = False	1350s	1350s	1350s	1350s	465.23s
THERMO_SPANS_COUPLING = True	1350s	1350s	1350s	8100s	184.98s

The results show a reduction in run duration from 465.23s to 184.98s, significantly improving performance.

However, further scientific testing for longer runs is necessary to confirm that the differences are negligible.

minghangli-uni commented 2 months ago

The above comment only changes the ocn dt_therm, causing it to differ from the coupling timestep. Hence DIABATIC_FIRST must be set to False. Enabling the diabatic process before the dynamic step requires the tracer timestep to be the same as the coupling timestep.

DIABATIC_FIRST = False          !   [Boolean] default = False
                                ! If true, apply diabatic and thermodynamic processes, including buoyancy
                                ! forcing and mass gain or loss, before stepping the dynamics forward.

Otherwise an error pops up,

    if (CS%diabatic_first .and. (CS%t_dyn_rel_adv==0.0) .and. do_thermo) then ! do thermodynamics.
...
      elseif (thermo_does_span_coupling) then
        dtdia = dt_therm
        if ((fluxes%dt_buoy_accum > 0.0) .and. (dtdia > time_interval) .and. &
            (abs(fluxes%dt_buoy_accum - dtdia) > 1e-6*dtdia)) then
          call MOM_error(FATAL, "step_MOM: Mismatch between long thermodynamic "//&
            "timestep and time over which buoyancy fluxes have been accumulated.")
        endif
...

aekiss commented 1 month ago

FYI increasing DT_THERM also gives a significant speedup at 1°.

I just tried out 1deg_jra55do_ryf with

DIABATIC_FIRST = False
DT = 1800.0
DT_THERM = 10800.0      ! 6*DT
THERMO_SPANS_COUPLING = True
DTBT_RESET_PERIOD = 10800.0

The walltime for 1 month was 11:05, compared to 17:33 with the previous value DT_THERM = 3600.0 (double DT)

aekiss commented 4 weeks ago

Some testing suggestions as discussed in today's TWG:

save diagnostics that will be sensitive to numerical artefacts due to excessively large DT_THERM. I'm guessing these artefacts will show up as grid-scale noise, so a diagnostic that's sensitive to short length scales could be a good way to detect these, e.g. T_diffx, T_diffy, S_diffx, S_diffy saved as snapshots, not time averages.
I expect such numerical issues to be visible quickly, so do some very short runs (one or two thermo timesteps, starting from a control with DT_THERM=DT that's spun up for a decade or more) with ridiculously large DT_THERM, and compare with DT_THERM=DT at the same model time to get a feel for what the artefacts took like. Then do some more short runs with reduced DT_THERM to see how low it needs to be to reduce the artefacts to an acceptable level.
Try longer runs (decade or more) with DT_THERM chosen from previous step to see if any problems arise.

access-hive-bot commented 4 weeks ago

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/cosima-twg-meeting-minutes-2024/1734/16

aekiss commented 1 week ago

Just to preserve @AndyHoggANU's slack comment: it's thought that DT_THERM can be set to resolve the relevant physics (e.g. around 1-3hr to capture the diurnal cycle, given that JRA55do is 3-hourly). This would be independent of the horizontal grid resolution, so could makes things much cheaper at high resolution.

COSIMA / access-om3

Investigate changing MOM thermodynamic time step (`DT_THERM`) #138