Closed adele-morrison closed 1 year ago
The statement I was thinking about regarding this is[^1]:
Advective mass fluxes in MOM6 are often accumulated over multiple dynamic steps. The goal is that as we go to higher resolution, this tracer advection will remain stable at relatively long time-steps, allowing for the inclusion of many biogeochemical tracers without adding an undue burden in computational cost.
Also from the Tracer Timestep page, the mixing and diffusion are handled with the thermodynamic processes. I guess that suggests that things could get unstable with a sufficiently long thermodynamic time step? I don't know how long it takes for enough energy to cascade into small scales for that to be an issue though.
[^1]: From https://mom6.readthedocs.io/en/main/api/generated/pages/Tracer_Transport_Equations.html
Posting this figure here also showing the impact of changing the tracer timestep in panan-01. This shows the transport in each layer at 37S for the first 6 months of the run. The black is the original panan-01. The others are my recreations of that with different tracer timesteps and forcing. Note the wacky behaviour (southward AABW transport at the boundary!) when DT_THERM is increased.
Questions:
If anyone has the time and will to investigate this further that would be awesome. The two comparison runs (blue dashed and solid blue above) are:
/scratch/x77/akm157/mom6/archive/panan-01-zstar-v13-exact/
(blue dashed, DT_THERM = 1800)/scratch/x77/akm157/mom6/archive/panan-01-zstar-v13-exact_dt3600/
(blue solid, DT_THERM = 3600)Any ideas @angus-g? I think it's to do with the interaction with the open boundary. It was much harder to tell that anything (abyssal transports etc) were going wacky away from the boundary until many years later. Some interaction with the nudging timescales or something?
This issue has been mentioned on ACCESS Hive Community Forum. There might be relevant details there:
https://forum.access-hive.org.au/t/mom6-panan-timestep/459/7
Just confirming that the large tracer time step continues to be an issue (gets worse in fact) if you run for longer. This shows the same black and blue simulations from the above plot but now averaged over the 2nd year:
I can also confirm that I see a similar behaviour in the 1/20th run (this is the first 3 month average). Here the solid blue has a longer tracer timestep and looks wacky, and the dashed blue has a shorter tracer timestep and looks good. [Note that the original 1/20th looks a little odd, which could either be due to the bugs it had or perhaps a different density spacing in this diagnostic.]
Perhaps it's worth exploring a bit more and reducing DT_THERM in the panan-01 even more to 900?
Interesting indeed. So, it seems like the model is indeed stable with large DT_THERM
, but the solution goes to crap. Question is, how long can we push it without degrading the solution? And is this issue solely because of the open boundaries in PanAnt, or will we see the same issue in the global? Worth testing that too??
Hm, interesting. Definitely worth testing in the global to see if this is just a BC issue. I would expect increasing any of the timesteps will degrade the solution, even if the model is numerically stable (numerical stability doesn't mean the solution has converged to the continuous solution, ie become independent of further refinement in timestep). So maybe the question is, what features of a "reference" solution with a short timestep are "must haves" which we need to retain with a longer timestep? ie choose the longest timestep that retains the "must have" features. It could become disconcertingly subjective though...
Some interaction with the nudging timescales or something?
I haven't looked at the velocity side of things, but this is my understanding of the tracer nudging part of the OBC code. Each tracer has the segment forcing value segment%tr_Reg%Tr(m)%t
(directly from the file), and a corresponding "reservoir" segment%tr_Reg%Tr(m)%tres
. As far as I can tell, initially, the reservoirs are filled with the initial tracer values from the domain (i.e. the initial condition, not the first timestep of OBC forcing).
There's also an array which is used for modifying the reservoir (and restarts), OBC%tres_x
(and tres_y
). This array is initially filled with the segment data, not the domain initial condition. When radiation conditions are enabled, the tracer reservoir is reset to the forcing reservoir on the dynamics timestep in radiation_open_bdry_conds()
. I actually don't understand the purpose of this...
On the tracer timestep, during tracer advection, the boundary values are overwritten by the tracer reservoirs, which are then used for calculating the advective fluxes, etc. Once advection is done, the tracer reservoirs are updated. This is where the reservoir lengthscales come into play: they determine how the interior and forcing values modify the reservoir, depending on whether there is inflow or outflow through the boundary. Once the tracer reservoir is updated, the forcing reservoir is set to this new value.
Both advection and the reservoir update use uhtr
(respectively vhtr
), which is the accumulated zonal (meridional) flux over the tracer period. I suppose we could look at tres_x
/tres_y
in the restarts to see if these have gone weird with the increased tracer timestep, and somehow the reservoir modification is odd? That's the only real difference from the non-OBC case, which would otherwise just use the accumulated fluxes for advection without replacing any values from a reservoir.
I did the test with an even shorter tracer timestep and good news is it seems to be converged - DT_THERM = 1800 and DT_THERM = 900 look the same for the boundary transport in the panan-01:
Maybe the solution for new regional configs is to do a short run with different values of DT_THERM and check when the boundary transport converges? In these runs the difference shows up very quickly (3 months is sufficient to distinguish them).
Looks like someone else has had a similar issue with too large DT_THERM interacting with the boundary. From https://github.com/jsimkins2/nwa25/tree/main/run:
"Per Andrew Ross:
To summarize, NWA12 behaves normally with DT_THERM = 1200 or 1800. If I try to increase DT_THERM to 3600, however, the Gulf Stream rapidly dissipates with a bizarre wave traveling to the south, and then reestablishes as a very weak current that doesn't separate at all. At the same time there appears to be extensive mixing in the Gulf Stream region, evident as cooling above 1000 m and warming below 1000 m all the way down to 5000 m. The initial dissipation starts immediately after switching to DT_THERM = 3600, and the total collapse and reestablishment happens within the first few months. Afterwards the weak boundary current remains roughly stable for years. I've attached some figures showing sea level as this is happening.
Previously I've run different experiments to try to identify the issue. I completely turned off tides and the open boundaries but still encountered the problem. The only control that I've found is the biharmonic viscosity (either Smagorinsky or using the velocity scale); with a low viscosity, the Gulf Stream dissipates much more rapidly, while with a high viscosity, the dissipation still happens but it occurs a lot slower."
I did another check, and we could also run the 1/20th panan with a slightly larger DT_THERM = 1200. However, this would require us reducing DT to 400 from 450, so that DT is a factor of DT_THERM, and from my test this doesn't give us any speed up.
GFDL runs their OM4 cases with DT_THERM = 7200, and suggest that there is no need to decrease this as resolution is increased. We were previously running all panan cases with DT_THERM = 1800. I found a 20% speedup by increasing this to 3600 in panan-01-zstar.
Today we discussed (badly) what a reasonable value for this might be, e.g. I suggest we should choose between 3600 and 7200. How would we tell if we have increased it too far? Perhaps a comparison test with a short and long tracer timestep would be useful and an investigation of what impact it has on regions like the overflows.
@angus-g volunteered to do a bit more reading on tracer advection and timesteps from the mom6 docs and update us.