COSIMA / access-om3

ACCESS-OM3 global ocean-sea ice-wave coupled model
13 stars 6 forks source link

crash: unused buoyancy fluxes #195

Open aekiss opened 1 month ago

aekiss commented 1 month ago

I tried a 1-year run of 1deg_jra55do_ryf with

DIABATIC_FIRST = False
DT = 1800.0
DT_THERM = 10800.0      ! 6*DT
THERMO_SPANS_COUPLING = True
DTBT_RESET_PERIOD = 10800.0

This timestepped through the full 12 months, then crashed with

FATAL from PE     0: ocean_model_restart was called with unused buoyancy fluxes.  For conservation, the ocean restart files can only be created after the buoyancy forcing is applied.

This occurs in the NUOPC cap.

I'm using

     ocn_cpl_dt =  3600
     stop_n = 1
     stop_option = nyears
minghangli-uni commented 1 month ago

Thanks @aekiss. I might need to check the source for details. Before that, I've done some quick tests with various combinations of dt, dt_therm, and dt_cpl under DIABATIC_FIRST = False and THERMO_SPANS_COUPLING = True. All the tests (1deg ryf) ran for 3 model days.

dt (s) dt_therm (s) dt_cpl (s) Result
1800 10800 (6*dt) 1800 (1*dt) Worked
1800 *10800 (6dt)** *3600 (2dt)** Crashed
1800 10800 (6*dt) 5400 (3*dt) Worked
dt (s) dt_therm (s) dt_cpl (s) Result
1800 3600 (2*dt) 3600 (2*dt) Worked
1800 7200 (4*dt) 3600 (2*dt) Worked
1800 14400 (8*dt) 3600 (2*dt) Worked
dt (s) dt_therm (s) dt_cpl (s) Result Ref
1350 8100 (6*dt) 1350 (1*dt) Worked Current 0.25deg test
1350 *8100 (6dt)** *2700 (2dt)** Crashed \
900 7200 (8*dt) 3600 (4*dt) Worked GFDL OM5&OM4 0.25deg

It appears that the model only crashes when the ratio dt:dt_therm:dt_cpl is 1:6:2.

NB:

aekiss commented 1 month ago

Great, thanks @minghangli-uni for all the tests! So I got an unlucky combination. My run is here FYI: /home/156/aek156/payu/MOM6-CICE6-1deg_jra55do_ryf.iss138

aekiss commented 1 month ago

Perhaps dt_therm/dt_cpl needs to be even? If you have a moment, could you try dt:dt_therm:dt_cpl = 1:10:2 to see if that also crashes with ocean_model_restart was called with unused buoyancy fluxes?

aekiss commented 1 month ago

or dt:dt_therm:dt_cpl = 1:5:1, for that matter

minghangli-uni commented 1 month ago

Perhaps dt_therm/dt_cpl needs to be even? 1:10:2 or dt:dt_therm:dt_cpl = 1:5:1, for that matter

Thank you for the suggestion, but they both worked and didnt crash. I am looking into this issue.

minghangli-uni commented 1 month ago

When the ratio of the tracer timestep to the coupling timestep is 3:1, the model crashes with the error mentioned above.

The log output can trace the execution flow when the ratio is not 3:1. When the ratio is 3:1, the sequence of operations is disordered. There must be some synchronization mechanisms or other factors causing this, but it is quite unusual to me.

minghangli-uni commented 1 month ago

I've submitted an issue to NCAR/MOM6 regarding this problem. https://github.com/NCAR/MOM6/issues/290

aekiss commented 1 month ago

Thanks @minghangli-uni !

aekiss commented 1 month ago

Hi @minghangli-uni, did all your tests above have a cold start, or did some use a restart?

minghangli-uni commented 1 month ago

They all started from a cold start. Following Gustavo’s suggestion to start from a restart file, the error disappears.

minghangli-uni commented 3 weeks ago

In addition to the findings noted here, another test with a dt_therm=16200s and a dt_cpl=1800s (ratio is 9:1) results in the same error. I am revisiting this problem to explore reasons for the failure. Interestingly, the ratio works fine for 6:1, but not for 3:1 or 9:1.