Closed minghangli-uni closed 1 month ago
Thanks minghang for the explanation :) That behaviour seems correct? We could add a note to the wiki
We can do 1-timestep in 1-deg configs because the ocean timestep (DT_THERM
) equals the coupling timestep ocn_cpl_dt
As you said, in 0.25 degree, the smallest stop_n that gives a whole number of DT_THERM
is 12. (i.e. 12*3600/1350 is a whole number)
For the existing 025 deg config, DT_THERM
equals ocn_cpl_dt
too, which is 1350s.
As you said, in 0.25 degree, the smallest stop_n that gives a whole number of DT_THERM is 12. (i.e. 12*3600/1350 is a whole number)
I dont follow.. the smallest stop_n
that gives a whole number is 3, right?
Are you suggesting that since the ocn_cpl_dt is 3600, achieving a whole number with 12 nsteps (12*3600/1350) would indicate correctness?
We might be discussing different things here. From wiki, it suggests that {model_name}_cpl_dt
are unused and the driver time-step equals the coupling time-step set in nuopc.runseq
. But when the restart_option
is set to nsteps
, {model_name}_cpl_dt
comes into play and overrides the coupling timestep set in nuopc.runseq
.
Good catch @minghangli-uni. So it looks like we are wrong about the <component>_cpl_dt
variables not being used. The question then is where/why/how are they being used. Are they just used for sanity checks like this one or are they doing more than that?
Note I did write a caveat in the wiki 😉:
However, I would feel more comfortable if I understood why {model_name}_cpl_dt are ever needed...
@ezhilsabareesh8, I presume you didn't try to run either of these PRs with restart_option = nsteps
?:
@dougiesquire , I just tried running the IAF config for 5 steps with restart_option = nsteps
it is working fine and the ice output prints the correct dt.
Calendar
--------------------------------
days_per_year = 365 : number of days in a model year
use_leap_years = T : leap days are included
dt = 3600.00 : model time step
CLOCK_attributes::
atm_cpl_dt = 99999 #not used
calendar = GREGORIAN
end_restart = .false.
glc_avg_period = yearly
glc_cpl_dt = 86400
history_ymd = -999
ice_cpl_dt = 99999 #not used
lnd_cpl_dt = 99999 #not used
ocn_cpl_dt = 99999 #not used
restart_n = 5
restart_option = nsteps
restart_ymd = -999
rof_cpl_dt = 99999 #not used
start_tod = 0
start_ymd = 19580101
stop_n = 5
stop_option = nsteps
stop_tod = 0
stop_ymd = -999
tprof_n = -999
tprof_option = never
tprof_ymd = -999
wav_cpl_dt = 99999 #not used
::
@ezhilsabareesh8 Can you please post the directory for this iaf run?
It does look like the cpl_dt are important for setting at least the mediator timestep:
Its suprising it just work without it!
@ezhilsabareesh8 Can you please post the directory for this iaf run?
It's in my home directory, let me copy it to a different location. I am just running the MOM6-CICE6 IAF configuration with the nuopc settings mentioned above.
It does look like the cpl_dt are important for setting at least the mediator timestep:
@anton-seaice, this is what I wrote about this in the wiki, but clearly deeper investigation is needed:
The nuopc.runseq file specifies the run sequence of the configuration. The run sequence for current ACCESS-OM3 configurations comprises a single loop, with the coupling time-step specified at the start of the loop (this is the “timeStep” of the loop in NUOPC-speak).
Note, that there are parameters {model_name}_cpl_dt set in the CLOCK_attributes section of nuopc.runconfig. The only place these are used in CMEPS is to set the driver time-step as the minimum of these values. However from the NUOPC documentation and CMEPS codebase:
"Each time loop has its own associated clock object. NUOPC manages these clock objects, i.e. their creation and destruction, as well as startTime, endTime, timeStep adjustments during the execution. The outer most time loop of the run sequence is a special case. It uses the driver clock itself. If a single outer most loop is defined in the run sequence provided by freeFormat, this loop becomes the driver loop level directly. Therefore, setting the timeStep or runDuration for the outer most time loop results modifying the driver clock itself. However, for cases with concatenated loops on the upper level of the run sequence in freeFormat, a single outer loop is added automatically during ingestion, and the driver clock is used for this loop instead."
So I think in our case, {model_name}_cpl_dt are unused and the driver time-step equals the coupling time-step set in nuopc.runseq. Certainly, changing these values seems to have no effect. However, I would feel more comfortable if I understood why {model_name}_cpl_dt are ever needed...
I just tried running the IAF config for 5 steps with restart_option = nsteps it is working fine
Hi @ezhilsabareesh8, it appeears that you haven't modified glc_cpl_dt
, which remains at 86400s instead of 99999s.
CLOCK_attributes::
atm_cpl_dt = 99999 #not used
calendar = GREGORIAN
end_restart = .false.
glc_avg_period = yearly
glc_cpl_dt = 86400
history_ymd = -999
ice_cpl_dt = 99999 #not used
...
Hence the updated total runlength was calculated as 86400*5/1350=320, resulting in a whole number and allowing the run to proceed without any issues.
the ice output prints the correct dt.
Despite the total runlength updating, the timestep for each component remains unchanged at 1350s. Hence, you can determine your dt_ice_thermo
to be 1350s.
it appeears that you haven't modified
glc_cpl_dt
, which remains at 86400s instead of 99999s.
When I set glc_cpl_dt
to 99999, I am getting the following error
20240506 102538.059 ERROR PET11 (ice_comp_nuopc):(ModelAdvance) CICE clock not in sync with ESMF model clock
When I set glc_cpl_dt to 99999, I am getting the following error 20240506 102538.059 ERROR PET11 (ice_comp_nuopc):(ModelAdvance) CICE clock not in sync with ESMF model clock
I did the same run but didn't meet the error you described. However, the error message I received was similar to what I initially reported.
PET00 src/addon/NUOPC/src/NUOPC_Base.F90:956 Invalid argument - setClock timeStep=1350s is not a divisor of runDuration=99999s
Have you previously reported this issue elsewhere?
<component>_cpl_dt
is used to set up the minimum driver timestep, as explained by @dougiesquire in wiki. Since our runsequence uses only a outer loop, the timestep determined here will be overwritten by the coupling time-step.
The only caveat is when using stop_option = nsteps
, the total run duration will be modifed by
case (optNSteps,trim(optNSteps)//'s')
call ESMF_ClockGet(clock, TimeStep=AlarmInterval, rc=rc)
if (ChkErr(rc,__LINE__,u_FILE_u)) return
AlarmInterval = AlarmInterval * opt_n
For the other options, such as stop_option = nseconds
, the AlarmInterval
is hardcoded to be 1 second, similar to nminutes
(60 seconds) and nhours
(3600 seconds).
case (optNMinutes,trim(optNMinutes)//'s')
call ESMF_TimeIntervalSet(AlarmInterval, s=60, rc=rc)
AlarmInterval = AlarmInterval * opt_n
The AlarmInterval
here can be considered as the unit for the stop_option
. Then it is overwritten by AlarmInterval = AlarmInterval * opt_n
, which is the total run duration for each run.
Thanks @minghangli-uni. So I think the safest/clearest way forward is to set (at least) ocn_cpl_dt
to the coupling timestep (and possibly also ice_cpl_dt
and even atm_cpl_dt
if we think that adds clarity though it's obviously not necessary) and the rest to something large and obviously meaningless (e.g. 99999
). The wiki should also be updated. Let's discuss this in the TWG meeting today.
This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:
https://forum.access-hive.org.au/t/cosima-twg-meeting-minutes-2024/1734/11
To avoid confusion I think we should add a comment to ocn_cpl_dt
in nuopc.runconfig
such as
ocn_cpl_dt = 1350 # ignored (coupling timestep set by nuopc.runseq) unless stop_option=nsteps
Thanks @aekiss. We can use the automatic cherry-pick
tool to add this comments to all the branches.
It appears that there are issues related to
stop_option = nsteps
, contrary to what was proposed in the wiki.The default
CLOCK
setup innuopc.runconfig
for the current 0.25deg configuration is listed as follows, but changingrestart_n=10
,stop_n=10
andrestart_option = nsteps
:An error occurs with the above setup:
ERROR PET239 src/addon/NUOPC/src/NUOPC_Base.F90:956 Invalid argument - setClock timeStep=1350s is not a divisor of runDuration=36000s
This error suggests that these timesteps are still in use (i.e.,
3600*10/1350
). A more consolidated evidence can be found by checking the ESMF profiling results. When changingstop_n
to 12, the code can be run successfully. This is because 3600*12 can divide 1350 and equals 32, as evidenced by the count for[OCN] RunPhase1
, which is 32 instead of 12.NB:
days
andyears
are functioning properly, so this issue should not impact production runs. However, it's worth noting for anyone interested in conducting short tests.