crash: unused buoyancy fluxes

aekiss commented 1 month ago

I tried a 1-year run of 1deg_jra55do_ryf with

DIABATIC_FIRST = False
DT = 1800.0
DT_THERM = 10800.0      ! 6*DT
THERMO_SPANS_COUPLING = True
DTBT_RESET_PERIOD = 10800.0

This timestepped through the full 12 months, then crashed with

FATAL from PE     0: ocean_model_restart was called with unused buoyancy fluxes.  For conservation, the ocean restart files can only be created after the buoyancy forcing is applied.

This occurs in the NUOPC cap.

I'm using

     ocn_cpl_dt =  3600
     stop_n = 1
     stop_option = nyears

minghangli-uni commented 1 month ago

Thanks @aekiss. I might need to check the source for details. Before that, I've done some quick tests with various combinations of dt, dt_therm, and dt_cpl under DIABATIC_FIRST = False and THERMO_SPANS_COUPLING = True. All the tests (1deg ryf) ran for 3 model days.

dt (s)	dt_therm (s)	dt_cpl (s)	Result
1800	10800 (6*dt)	1800 (1*dt)	Worked
1800	*10800 (6dt)**	*3600 (2dt)**	Crashed
1800	10800 (6*dt)	5400 (3*dt)	Worked

dt (s)	dt_therm (s)	dt_cpl (s)	Result
1800	3600 (2*dt)	3600 (2*dt)	Worked
1800	7200 (4*dt)	3600 (2*dt)	Worked
1800	14400 (8*dt)	3600 (2*dt)	Worked

dt (s)	dt_therm (s)	dt_cpl (s)	Result	Ref
1350	8100 (6*dt)	1350 (1*dt)	Worked	Current 0.25deg test
1350	*8100 (6dt)**	*2700 (2dt)**	Crashed	\
900	7200 (8*dt)	3600 (4*dt)	Worked	GFDL OM5&OM4 0.25deg

It appears that the model only crashes when the ratio dt:dt_therm:dt_cpl is 1:6:2.

NB:

For MOM6, dt_therm is always an integer multiple of dt_cpl, and dt_cpl is always an integer multiple of dt. So I deliberately set the timesteps based on this criterion.
ocn_cpl_dt is a dummy value unless stop_option = nsteps. https://github.com/COSIMA/access-om3/issues/157,
```
 ocn_cpl_dt =  3600 
 stop_n = 1
 stop_option = nyears
```

aekiss commented 1 month ago

Great, thanks @minghangli-uni for all the tests! So I got an unlucky combination. My run is here FYI: /home/156/aek156/payu/MOM6-CICE6-1deg_jra55do_ryf.iss138

aekiss commented 1 month ago

Perhaps dt_therm/dt_cpl needs to be even? If you have a moment, could you try dt:dt_therm:dt_cpl = 1:10:2 to see if that also crashes with ocean_model_restart was called with unused buoyancy fluxes?

aekiss commented 1 month ago

or dt:dt_therm:dt_cpl = 1:5:1, for that matter

minghangli-uni commented 1 month ago

Perhaps dt_therm/dt_cpl needs to be even? 1:10:2 or dt:dt_therm:dt_cpl = 1:5:1, for that matter

Thank you for the suggestion, but they both worked and didnt crash. I am looking into this issue.

minghangli-uni commented 1 month ago

When the ratio of the tracer timestep to the coupling timestep is 3:1, the model crashes with the error mentioned above.

The log output can trace the execution flow when the ratio is not 3:1. When the ratio is 3:1, the sequence of operations is disordered. There must be some synchronization mechanisms or other factors causing this, but it is quite unusual to me.

minghangli-uni commented 1 month ago

I've submitted an issue to NCAR/MOM6 regarding this problem. https://github.com/NCAR/MOM6/issues/290

aekiss commented 1 month ago

Thanks @minghangli-uni !

aekiss commented 1 month ago

Hi @minghangli-uni, did all your tests above have a cold start, or did some use a restart?

minghangli-uni commented 1 month ago

They all started from a cold start. Following Gustavo’s suggestion to start from a restart file, the error disappears.

minghangli-uni commented 3 weeks ago

In addition to the findings noted here, another test with a dt_therm=16200s and a dt_cpl=1800s (ratio is 9:1) results in the same error. I am revisiting this problem to explore reasons for the failure. Interestingly, the ratio works fine for 6:1, but not for 3:1 or 9:1.

COSIMA / access-om3

crash: unused buoyancy fluxes #195