Open aekiss opened 1 month ago
Thanks @aekiss. I might need to check the source for details. Before that, I've done some quick tests with various combinations of dt, dt_therm, and dt_cpl under DIABATIC_FIRST = False
and THERMO_SPANS_COUPLING = True
. All the tests (1deg ryf) ran for 3 model days.
dt (s) | dt_therm (s) | dt_cpl (s) | Result |
---|---|---|---|
1800 | 10800 (6*dt) | 1800 (1*dt) | Worked |
1800 | *10800 (6dt)** | *3600 (2dt)** | Crashed |
1800 | 10800 (6*dt) | 5400 (3*dt) | Worked |
dt (s) | dt_therm (s) | dt_cpl (s) | Result |
---|---|---|---|
1800 | 3600 (2*dt) | 3600 (2*dt) | Worked |
1800 | 7200 (4*dt) | 3600 (2*dt) | Worked |
1800 | 14400 (8*dt) | 3600 (2*dt) | Worked |
dt (s) | dt_therm (s) | dt_cpl (s) | Result | Ref |
---|---|---|---|---|
1350 | 8100 (6*dt) | 1350 (1*dt) | Worked | Current 0.25deg test |
1350 | *8100 (6dt)** | *2700 (2dt)** | Crashed | \ |
900 | 7200 (8*dt) | 3600 (4*dt) | Worked | GFDL OM5&OM4 0.25deg |
It appears that the model only crashes when the ratio dt:dt_therm:dt_cpl is 1:6:2.
NB:
ocn_cpl_dt
is a dummy value unless stop_option = nsteps
. https://github.com/COSIMA/access-om3/issues/157,
ocn_cpl_dt = 3600
stop_n = 1
stop_option = nyears
Great, thanks @minghangli-uni for all the tests! So I got an unlucky combination. My run is here FYI:
/home/156/aek156/payu/MOM6-CICE6-1deg_jra55do_ryf.iss138
Perhaps dt_therm/dt_cpl needs to be even?
If you have a moment, could you try dt:dt_therm:dt_cpl = 1:10:2 to see if that also crashes with ocean_model_restart was called with unused buoyancy fluxes
?
or dt:dt_therm:dt_cpl = 1:5:1, for that matter
Perhaps dt_therm/dt_cpl needs to be even? 1:10:2 or dt:dt_therm:dt_cpl = 1:5:1, for that matter
Thank you for the suggestion, but they both worked and didnt crash. I am looking into this issue.
When the ratio of the tracer timestep to the coupling timestep is 3:1, the model crashes with the error mentioned above.
The log output can trace the execution flow when the ratio is not 3:1. When the ratio is 3:1, the sequence of operations is disordered. There must be some synchronization mechanisms or other factors causing this, but it is quite unusual to me.
I've submitted an issue to NCAR/MOM6 regarding this problem. https://github.com/NCAR/MOM6/issues/290
Thanks @minghangli-uni !
Hi @minghangli-uni, did all your tests above have a cold start, or did some use a restart?
They all started from a cold start. Following Gustavo’s suggestion to start from a restart file, the error disappears.
In addition to the findings noted here, another test with a dt_therm=16200s and a dt_cpl=1800s (ratio is 9:1) results in the same error. I am revisiting this problem to explore reasons for the failure. Interestingly, the ratio works fine for 6:1, but not for 3:1 or 9:1.
I tried a 1-year run of
1deg_jra55do_ryf
withThis timestepped through the full 12 months, then crashed with
This occurs in the NUOPC cap.
I'm using