COSIMA / mom6-panan

Pan-Antarctic regional configuration of MOM6
MIT License
6 stars 6 forks source link

Tracer timestep choice #28

Closed adele-morrison closed 1 year ago

adele-morrison commented 1 year ago

GFDL runs their OM4 cases with DT_THERM = 7200, and suggest that there is no need to decrease this as resolution is increased. We were previously running all panan cases with DT_THERM = 1800. I found a 20% speedup by increasing this to 3600 in panan-01-zstar.

Today we discussed (badly) what a reasonable value for this might be, e.g. I suggest we should choose between 3600 and 7200. How would we tell if we have increased it too far? Perhaps a comparison test with a short and long tracer timestep would be useful and an investigation of what impact it has on regions like the overflows.

@angus-g volunteered to do a bit more reading on tracer advection and timesteps from the mom6 docs and update us.

angus-g commented 1 year ago

The statement I was thinking about regarding this is[^1]:

Advective mass fluxes in MOM6 are often accumulated over multiple dynamic steps. The goal is that as we go to higher resolution, this tracer advection will remain stable at relatively long time-steps, allowing for the inclusion of many biogeochemical tracers without adding an undue burden in computational cost.

Also from the Tracer Timestep page, the mixing and diffusion are handled with the thermodynamic processes. I guess that suggests that things could get unstable with a sufficiently long thermodynamic time step? I don't know how long it takes for enough energy to cascade into small scales for that to be an issue though.

[^1]: From https://mom6.readthedocs.io/en/main/api/generated/pages/Tracer_Transport_Equations.html

adele-morrison commented 1 year ago

Posting this figure here also showing the impact of changing the tracer timestep in panan-01. This shows the transport in each layer at 37S for the first 6 months of the run. The black is the original panan-01. The others are my recreations of that with different tracer timesteps and forcing. Note the wacky behaviour (southward AABW transport at the boundary!) when DT_THERM is increased.

Screen Shot 2023-03-22 at 11 02 29 am

Questions:

  1. Why did increasing the tracer timestep have this impact?
  2. How do we tell in the future if we have the right tracer timestep? As far I could tell there weren't a lot more instabilities or obvious errors with the larger timestep.

If anyone has the time and will to investigate this further that would be awesome. The two comparison runs (blue dashed and solid blue above) are:

adele-morrison commented 1 year ago

Any ideas @angus-g? I think it's to do with the interaction with the open boundary. It was much harder to tell that anything (abyssal transports etc) were going wacky away from the boundary until many years later. Some interaction with the nudging timescales or something?

access-hive-bot commented 1 year ago

This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:

https://forum.access-hive.org.au/t/mom6-panan-timestep/459/7

adele-morrison commented 1 year ago

Just confirming that the large tracer time step continues to be an issue (gets worse in fact) if you run for longer. This shows the same black and blue simulations from the above plot but now averaged over the 2nd year:

Screen Shot 2023-03-23 at 6 15 56 am
adele-morrison commented 1 year ago

I can also confirm that I see a similar behaviour in the 1/20th run (this is the first 3 month average). Here the solid blue has a longer tracer timestep and looks wacky, and the dashed blue has a shorter tracer timestep and looks good. [Note that the original 1/20th looks a little odd, which could either be due to the bugs it had or perhaps a different density spacing in this diagnostic.]

Screen Shot 2023-03-23 at 6 38 18 am

Perhaps it's worth exploring a bit more and reducing DT_THERM in the panan-01 even more to 900?

AndyHoggANU commented 1 year ago

Interesting indeed. So, it seems like the model is indeed stable with large DT_THERM, but the solution goes to crap. Question is, how long can we push it without degrading the solution? And is this issue solely because of the open boundaries in PanAnt, or will we see the same issue in the global? Worth testing that too??

aekiss commented 1 year ago

Hm, interesting. Definitely worth testing in the global to see if this is just a BC issue. I would expect increasing any of the timesteps will degrade the solution, even if the model is numerically stable (numerical stability doesn't mean the solution has converged to the continuous solution, ie become independent of further refinement in timestep). So maybe the question is, what features of a "reference" solution with a short timestep are "must haves" which we need to retain with a longer timestep? ie choose the longest timestep that retains the "must have" features. It could become disconcertingly subjective though...

angus-g commented 1 year ago

Some interaction with the nudging timescales or something?

I haven't looked at the velocity side of things, but this is my understanding of the tracer nudging part of the OBC code. Each tracer has the segment forcing value segment%tr_Reg%Tr(m)%t (directly from the file), and a corresponding "reservoir" segment%tr_Reg%Tr(m)%tres. As far as I can tell, initially, the reservoirs are filled with the initial tracer values from the domain (i.e. the initial condition, not the first timestep of OBC forcing).

There's also an array which is used for modifying the reservoir (and restarts), OBC%tres_x (and tres_y). This array is initially filled with the segment data, not the domain initial condition. When radiation conditions are enabled, the tracer reservoir is reset to the forcing reservoir on the dynamics timestep in radiation_open_bdry_conds(). I actually don't understand the purpose of this...

On the tracer timestep, during tracer advection, the boundary values are overwritten by the tracer reservoirs, which are then used for calculating the advective fluxes, etc. Once advection is done, the tracer reservoirs are updated. This is where the reservoir lengthscales come into play: they determine how the interior and forcing values modify the reservoir, depending on whether there is inflow or outflow through the boundary. Once the tracer reservoir is updated, the forcing reservoir is set to this new value.

Both advection and the reservoir update use uhtr (respectively vhtr), which is the accumulated zonal (meridional) flux over the tracer period. I suppose we could look at tres_x/tres_y in the restarts to see if these have gone weird with the increased tracer timestep, and somehow the reservoir modification is odd? That's the only real difference from the non-OBC case, which would otherwise just use the accumulated fluxes for advection without replacing any values from a reservoir.

adele-morrison commented 1 year ago

I did the test with an even shorter tracer timestep and good news is it seems to be converged - DT_THERM = 1800 and DT_THERM = 900 look the same for the boundary transport in the panan-01:

Screen Shot 2023-03-24 at 4 56 51 pm

Maybe the solution for new regional configs is to do a short run with different values of DT_THERM and check when the boundary transport converges? In these runs the difference shows up very quickly (3 months is sufficient to distinguish them).

adele-morrison commented 1 year ago

Looks like someone else has had a similar issue with too large DT_THERM interacting with the boundary. From https://github.com/jsimkins2/nwa25/tree/main/run:

"Per Andrew Ross:

To summarize, NWA12 behaves normally with DT_THERM = 1200 or 1800. If I try to increase DT_THERM to 3600, however, the Gulf Stream rapidly dissipates with a bizarre wave traveling to the south, and then reestablishes as a very weak current that doesn't separate at all. At the same time there appears to be extensive mixing in the Gulf Stream region, evident as cooling above 1000 m and warming below 1000 m all the way down to 5000 m. The initial dissipation starts immediately after switching to DT_THERM = 3600, and the total collapse and reestablishment happens within the first few months. Afterwards the weak boundary current remains roughly stable for years. I've attached some figures showing sea level as this is happening.

Previously I've run different experiments to try to identify the issue. I completely turned off tides and the open boundaries but still encountered the problem. The only control that I've found is the biharmonic viscosity (either Smagorinsky or using the velocity scale); with a low viscosity, the Gulf Stream dissipates much more rapidly, while with a high viscosity, the dissipation still happens but it occurs a lot slower."

adele-morrison commented 1 year ago

I did another check, and we could also run the 1/20th panan with a slightly larger DT_THERM = 1200. However, this would require us reducing DT to 400 from 450, so that DT is a factor of DT_THERM, and from my test this doesn't give us any speed up.

Screen Shot 2023-04-11 at 9 52 17 am