NCAR / MOM6

NCAR/CESM fork of the Modular Ocean Model v.6 (MOM6)
Other
2 stars 18 forks source link

Trouble writing some 2D fields with hourly output #201

Open mnlevy1981 opened 2 years ago

mnlevy1981 commented 2 years ago

I first noticed this problem in my branch for the MARBL driver, but am now running into it with dev/ncar. If I set up a stream in diag_table to write hourly output, such as by changing

- "CMOM.withCFCS.with_nonlocal.mom6.hm%4yr-%2mo",                   1,  "months",  1, "days",   "time", 1, "months"
+ "CMOM.withCFCS.with_nonlocal.mom6.hm%4yr-%3dy",                   1,  "hours",   1, "days",   "time", 1, "days"

(and also changing hm%4yr-%2mo to hm%4yr-%3dy throughout the file), then there are certain fields that can't be written to that stream without causing an error. Three specific examples of errors I've gotten are for Rd_dx, SSH, and cfc11_flux:

FATAL from PE     0: diag_manager_mod::send_data_3d: module/output_field ocean_model/Rd_dx, write EMPTY buffer
FATAL from PE     0: diag_manager_mod::send_data_3d: module/output_field ocean_model/SSH, write EMPTY buffer
FATAL from PE     0: diag_manager_mod::send_data_3d: module/output_field ocean_model/cfc11_flux, write EMPTY buffer

I believe these fields (and the fields I have trouble with on my MARBL branch) are all 2D - I don't know if that's a hint at what is going wrong or a red herring.

gustavo-marques commented 2 years ago

Can you write them as snapshots (i.e., replace "mean" to "none" in the diag_table)?

mnlevy1981 commented 2 years ago

Can you write them as snapshots (i.e., replace "mean" to "none" in the diag_table)?

yes and no -- if I change them to none (and also change the 3D fields from mean to none to avoid the file CMOM.withCFCS.with_nonlocal.mom6.hm2%4yr-%3dy can NOT have BOTH time average AND instantaneous fields error), then the run finishes but the first time slice of the 2D fields is all 0s (there's not even a mask). It seems like there's a lag, where we are writing the data to disk before posting the current time step.

klindsay28 commented 2 years ago

The post_data call for cfc11_flux is in subroutine forcing_diagnostics in src/core/MOM_forcing_type.F90, which is called in subroutine update_ocean_model in config_src/drivers/nuopc_cap/mom_ocean_model_nuopc.F90. This is pretty much right after the call to step_MOM, so the post_data call is performed after MOM has advanced a coupling timestep.

I haven't figured out yet where in MOM diagnostic fields are 'written to disk'. But based on Mike's results, it seems like it is happening inside step_MOM. I suspect this is why Mike is seeing problems attempting to write hourly means of this field, the model is trying to compute a time mean before any samples have been posted.

I haven't examined the timing of the post_data calls for the other fields that Mike is reporting difficulty with.

gustavo-marques commented 2 years ago

We apply an ocean lag in startup runs and that's why the first time slice are 0s.

mnlevy1981 commented 2 years ago

We apply an ocean lag in startup runs and that's why the first time slice are 0s.

A lag in the startup run explains why these fields are unavailable at the first time step, but then I'm confused about why other fields can be written after the first hour. Shouldn't it be all or nothing?

@klindsay28's explanation makes sense - it seems like the time level for the diagnostics buffer is incremented before all the diagnostics have been posted, so there's a one time-step lag between the forcing diagnostics and everything that gets posted in calls out of step_MOM(). When writing every timestep, this results in writing the first file to disk before posting any forcing diagnostics.

I guess my questions are

  1. Is this a bug?
  2. If so, is it a problem with MOM in general or is it specifically an issue with the CESM driver(s)?
  3. If it's not a bug, could we do something like write all NaNs if a buffer is empty instead of aborting? I could see requiring users to enable some sort of flag to allow this behavior, or putting in a check to say "this buffer is empty, but it's also the first time step of the run so we'll let it slide this time and abort if it's still empty next time"