Open bartgol opened 1 year ago
My guess is that there's a difference in how EAM and CIME assume their internal clocks to be:
These two assumptions conflict and ultimately generate the bug when the two clocks are compared inside the while loop, to decide whether EAM has completed its time step, causing EAM to take 2 steps (instead of 1) to catch up with CIME's clock.
It appears that during the first call to atm_run, EAM performs the cam_run2-3-4-1 sequence twice instead of once. The odd behavior can be seen running the test SMS_Ln5.ne4pg2_oQU480.F2010.mappy_gnu.eam-thetahy_pg2 (part of the e3sm_developer testsuite).
If you look at the test input parameters, you will find that
se_nsplit=2
This means that homme's
prim_run_subcycle
should be invoked 10 times (5 steps x 2 calls/step). However, looking atmodel_timing_stats
, you will notice that the number of calls to the corresponding timer (a:prim_run_subcycle
) is 12*NTASKS. Hacking the code and adding some print statements in atm_run and cam_runX, I got a confirmation of the "extra" step:There's an extra run2-3-4-1 sequence in the first timestep, and each of the two sequences is using a full timestep (it may be "ok" (though still puzzling) if the two steps were with dt=dt/2). This appears to be caused by the fact that the EAM internal clock is at t=0 upon entry of the first time step, while the CIME clock is already at t=dt. The end condition for the while loop (dosend) is a check on whether the CIME and EAM clocks are in sync, which therefore fails at the first step.
I am thinking that this is not the expected behavior, so I'm labeling as a bug. But I don't understand EAM's internal timestep logic to be sure of this.