SST in WC14 stand-alone spin-up warms to 60C

mark-petersen commented 2 years ago

An unphysical warming to 60C on the Indonesian coast occurs in the WC14 stand-alone spin-up using compass/master 2270c038, even after fixing the Coriolis bug noted in #304. This can be seen with:

compass/ocean/tests/global_ocean/init/namelist.init
-config_expand_sphere = .false.
+config_expand_sphere = .true.

compass list|grep WC14
  95: ocean/global_ocean/WC14/mesh
  96: ocean/global_ocean/WC14/PHC/init
  97: ocean/global_ocean/WC14/PHC/performance_test
  98: ocean/global_ocean/WC14/PHC/dynamic_adjustment

compass setup -f Config_dir/e3sm_master -w $DIR -n 95 96 97 98

After running these steps on grizzly with load_dev_compass_1.0.0_grizzly_gnu_mvapich.sh, the error can be seen here:

cd /lustre/scratch4/turquoise/mpeterse/runs/220126_compass_master_E3SM_head_gr_gfortran_openmp_WC1402
cd ocean/global_ocean/WC14/PHC/dynamic_adjustment/simulation/analysis_members/

ncdump -v temperatureMax globalStats.*.nc |tail -n 2
    60.5092380065623, 60.5138480143581, 60.5184557681456, 60.5230611583427 ;

It reaches 60C by day 25 of the simulation.

mark-petersen commented 2 years ago

I conducted a large number of tests, but I understand it now so can make a cleaner presentation. I started with:

WC14 on head of compass/legacy: OK, max remains at 30.1C
WC14 on head of compass/master with expand_sphere on: described above, max is 60.5

The namelist differences from (1) to (2) on the simulation step are:

<     config_mom_del2 = 400
>     config_mom_del2 = 1.0e3
<     config_use_activeTracers_surface_restoring = .true.
>     config_use_activeTracers_surface_restoring = .false.

The config_mom_del2 was unintentional, and is fixed in #307. This exacerbates the problem, but is not the root cause of the 60C temperature.

Turning off surface restoring was intentional, in #99. The surface restoring was the change that allowed the 60C problem to occur. But it was essentially 'covering up' some other mechanism that causes non-monotonic warming. To show this, I did the additional tests:

WC14 on head of compass/legacy with config_use_activeTracers_surface_restoring = .false. (config_mom_del2 remains at 400): T increases, but only from 30.1 to 33.2. My run at
```
/lustre/scratch4/turquoise/mpeterse/runs/220126_legacy_head_E3SM_head_gr_gfortran_openmp_WC1402/ocean/global_ocean/WC14/spin_up/test_final_settings_rstr_off
```

WC14 on head of compass/legacy with config_use_activeTracers_surface_restoring = .false. and config_mom_del2 = 1.0e3: T increases, from 30.1 to 40C. My run at

/lustre/scratch4/turquoise/mpeterse/runs/220126_legacy_head_E3SM_head_gr_gfortran_openmp_WC1402/ocean/global_ocean/WC14/spin_up/test_final_settings_rstr_off_del2mom1e3

The conclusion from these simulations is that we can reproduce this behavior from compass/legacy with these two flags. We still don't understand the root cause. Note that simulation (4) hit 40C rather than 60C. This is probably because of the radius of the earth change #95 and some other non-bfb changes in compass and MPAS. As a growing instability, the exact value doesn't matter, just the behavior.

mark-petersen commented 2 years ago

These are the potential causes of the warming, each tested with a simulation:

Minimum layers: NO. Domain has many columns with maxLevelCell=2, and usually we run with minimum of 3. This was NOT the cause. I reran compass/master with config_global_ocean_minimum_depth = 30 in init mode, as top three layers are each 10m. Still, we should add a minimum number of layers flag, as a two layer setup was unintentional here. Vertical mixing with boundary conditions and tri-diagonal solve is really meant to run with three layers (upper boundary, lower boundary, and one inner cell), though it appears to work fine with two layers.
```
/lustre/scratch4/turquoise/mpeterse/runs/220127_n8_compass_master_E3SM_head_3levelmin_gr_gfortran_openmp_WC1402/ocean/global_ocean/WC14/PHC/dynamic_adjustment/simulation
```

mark-petersen commented 2 years ago

Thickness fluxes: NO. In compass/master, we still set config_use_bulk_thickness_flux = .true.. This is a proxy for precipitation, but does not make sense after we set config_use_activeTracers_surface_restoring = .false. in #99. The data surface restoring and thickness flux really go hand in hand as proxies for surface fluxes. So we should turn thickness flux off. Still, in a test with it off, we still have monotonic warming.
```
/lustre/scratch4/turquoise/mpeterse/runs/220127_n9_compass_master_E3SM_head_thickFluxOff_gr_gfortran_openmp_WC1402/ocean/global_ocean/WC14/PHC/dynamic_adjustment
```

mark-petersen commented 2 years ago

Redi Mixing: Yes. If I set config_use_Redi = .false., the maximum temperature in the simulation decreases monotonically from 30.271 to 30.260 after 31 days.

/lustre/scratch4/turquoise/mpeterse/runs/220127_n9_compass_master_E3SM_head_thickFluxOff_gr_gfortran_openmp_WC1402/ocean/global_ocean/WC14/PHC/dynamic_adjustment/simulation_n13_redi_off

What is going on here? In retrospect it seems obvious, but it caught me off guard. All of our horizontal and vertical advection and diffusion schemes are guaranteed to be monotonic, so should not cause an increasing maximum. Surface fluxes could cause new maxima, but when surface restoring is on, it restores to the initial condition in MPAS-O stand-alone, so that also prevents new maxima.

Redi mixing is a different beast. I documented myself that the skew formulation of Redi is globally tracer conserving, but is not locally bounds preserving in the original pull request and it can be seen on the bounds of this idealized test result. This behavior is even the topic of a paper in Beckers et al. 1998 and is improved but remains in the triad formulation, see my comments at the end of this post.

What usually saves us is that the spurious mins and maxes are small and washed out by diffusion, advection, and surface fluxes. In practice it is only seen in non-physical situations, like intentionally steep isopycnals or tracer step functions seen here.

mark-petersen commented 2 years ago

Going back to what caused this issue to begin with, we have Redi mixing on, but surface restoring off. In shallow 2 or 3 layer regions agains the coastline, Redi mixing produces a spurious increase in temperature. This does not happen in stand-alone with config_use_activeTracers_surface_restoring = .true. or in E3SM with surface fluxes.

For the WC14 spin-up, the solution is that Redi mixing must be turned off when surface restoring is turned off, in order to avoid these spurious tracer maxima. For all MPAS stand-alone global ocean spin-ups, we should either:

Turn OFF BOTH surface restoring and Redi mixing
Turn ON BOTH surface restoring and Redi mixing

xylar commented 2 years ago

Thanks for this detective work, @mark-petersen. I would like to hear from @vanroekel what he thinks, since he requested turning off surface restoring in #99 if my recollection is correct. My sense is that we want to turn off both Redi and surface restoring for dynamic adjustment, so that the focus is spinning up the velocity field, dissipating surface waves, increasing the time step, and removing Rayleigh damping so we're ready to run in E3SM without having to do dynamic adjustment there.

@vanroekel, what do you think?

vanroekel commented 2 years ago

@xylar thanks for the ping on this. and @mark-petersen great detective work on this one. In retrospect this does make some sense, but a bit surprising as you say. As for how to go forward I agree with Xylar on this one that for the dynamic adjustment pre e3sm I'd suggest no surface restoring and no mesoscale parameterizations be active. Although as I've thought about this more it seems less important which direction we go (surface restoring or not) as the initial conditions wash out quickly. But whatever we do we should standardize, so I'd suggest no restoring no mesoscales.

mark-petersen commented 2 years ago

@vanroekel since you feel less strongly about this than before and said "initial conditions wash out quickly" in E3SM, I propose that we spin up with surface restoring, thickness fluxes, and mesoscale mixing on (for appropriate resolutions), as we always had in the past. I think that produces a better initial condition, because surface restoring mimics surface fluxes and keeps it closer to the observed climatology, while starting with the parameterizations on should make for a smoother dynamic transition to E3SM.

In the end the exact initial state may not matter. But there are the additional advantages that we are also testing the code during spin-up to be used in E3SM, all Compass cases (both testing and spin-up) can use the same settings, and for projects that continue stand-alone simulations (typically development and performance projects), the flags can mimic E3SM parameterization settings for each resolution.

vanroekel commented 2 years ago

@mark-petersen to me as long as the e3sm spin up stays less than a month or so, I don’t think the decision matters and am fine either way. I don’t want e3sm spin ups to get on toward months and more than a year like we did in v1. For 1 month spin ups I don’t think any of the decisions matter. That is just long enough to get the barotropic adjustment out of the way. Mesoscales or not really shouldn’t change the solution. So all in all I’m fine if we do short spin up in MPAS with mesoscale Parameterization on and restoring

xylar commented 2 years ago

@mark-petersen, I started implementing this and some important questions arise:

do we want to turn GM on for QU240? It is currently off as far as I can tell.
do we want to have surface restoring on for performance tests, etc? This is slightly easier to implement but will lead to non-bit-for-bit changes in our current baselines.

My intuition is that, yes, we want GM on for QU240 and, no, we don't want surface restoring on except during dynamic adjustment.

mark-petersen commented 2 years ago

It turns out that there is more to the story than just turning on surface restoring to compensate for new temperature extrema produced by Redi mixing. It is true that Redi is the root cause of temperature increases during spin-up. But when I compared stand-alone spin-up of the WC14 where everything was identical (using today's compass/master and e3sm/master plus surface restoring) but with two runs, one with the init step from compass/legacy and one with the init step from compass/master, then after 31 day spin-up and simulation, I get maxT=30.6C with compass/legacy, and maxT=38C with compass/master.

It’s the rough bathymetry, in combination with Redi mixing. This image is: left: compass/legacy, ETOPO. right: compass/master, using GEBCO. The legacy init step uses much smoother bathymetry. Screen Shot 2022-02-02 at 2 35 30 PM

When I use today’s head of compass/master and e3sm/master (so Coriolis init is fixed), and add surface restoring, and I only change to point from GEBCO back to ETOPO, maxT = 30.3. Also, with that same setup but remaining with GEBCO but with iterations of bathymetry smoothing in the init step,

bathymetry smoothing = 0, maxT = 39
bathymetry smoothing = 1, maxT = 41
bathymetry smoothing = 2, maxT = 37
bathymetry smoothing = 3, maxT = 31.2

Where maxT is the maximum temperature after the 31 day spin-up. So it looks like Redi has a hard time with rough bathymetry.

mark-petersen commented 2 years ago

What is the solution? We could make the bathymetry smoother, and that would avoid this particular instance of temperature over-shoot. That could be done a number of ways - smoothing GEBCO beforehand, using the flag config_global_ocean_topography_smooth_iterations = 3 in the init step, or going back to ETOPO.

We can do those things, but this problem raises a larger question, which is, does Redi cause spurious tracer extrema during E3SM simulations that are covered up by surface fluxes? For coupled simulations, could this behavior cause spurious surface fluxes?

Note that in the images in #304, the high temperatures only occur in the shallow cells beside coastlines. One possible solution is to only turn on Redi in columns with more than a minimum depth or number of vertical cells. Those are some ideas, but this certainly requires further discussion.

vanroekel commented 2 years ago

@mark-petersen thanks for the continued digging. Reading your last entries reminded me of this part of Griffies et al 2005 where they are talking about Redi in GFDLs OM3/3.1 (I think that is MOM4), but they say the following

Problems can also occur with truncated neutral physics grid stencils next to the solid earth and surface boundaries. 
Here, the numerical realization of neutral physics parameterizations can lead to the spurious creation of extrema. 
To address this problem, we reduced neutral physics to horizontal diffusion at grid points adjacent to all boundaries. 
This approach was also recommended by Gerdes et al. (1991).

This could be another possibility to consider in further discussion. Interestingly, they also suggest they have a global min/max in their redi limiter, where we only have a min limiter. I remember when we discussed that before a reasonable choice for the max limiter wasn't obvious.

I've also been wondering exactly what you mention here

For coupled simulations, could this behavior cause spurious surface fluxes?

It seems possible that this could cause issues, but given we didn't have redi in v1 I don't think it is the cause of the AMOC issue. However, @ytakano3 has seen unexpected behavior with Redi, where increasing redi has little results in CO2 flux and AMOC until it passes a certain value and then the AMOC actually crashes, which is opposite of what literature (Anand's work) suggests.

xylar commented 2 years ago

Thanks so much, @mark-petersen, for tracking down this issue and for your thoughts on what the solution might be.

I would strongly advocate for sticking with GEBCO combined with BedMachine. The switch was made primarily for the Cryosphere and it seems important to me that we use a common bathymetry for all meshes.

The SORRM mesh used smoothing in compass/legacy similar to but I think somewhat more aggressive than what you tested. We had meant to use the same in the latest mesh, the WCAtl12to45 but It looks like I inadvertently dropped it in one of my later commits.

I would prefer that we smooth bathymetry before sampling it to the MPAS grid rather than after. So much information is lost in the process of sampling at cell centers that I think we end up needing significantly more smoothing to compensate for the resulting noise than we would if we did more appropriate filtering or averaging before sampling.

My understanding is that @dengwirda has an efficient algorithm for averaging a high resolution bathymetry over each MPAS cell, which seems like the idea place to start. A clumsier alternative I worked on was to smooth the bathymetry proportionally to the MPAS cell size before sampling: https://github.com/MPAS-Dev/MPAS-Model/pull/489

I don't have any strong opinions about modifying Redi (or not).

xylar commented 2 years ago

@milenaveneziani, I think you may have good insights on both bathymetry smoothing and Redi. Feel free to join the conversation.

xylar commented 2 years ago

@ytakano3, this issue almost certainly affects your Kuroshio mesh so I want to make sure you're following along as well.

milenaveneziani commented 2 years ago

Thanks @xylar. I am indeed following the discussion closely. About Redi, @vanroekel: do you understand what "we reduced neutral physics to horizontal diffusion at grid points adjacent to all boundaries." mean exactly? that Redi was turned off near all boundaries and only horizontal diffusion was used instead? About smoothing: yes, I agree with @xylar that we should keep using GEBCO but apply smoothing more carefully/aggressively. @mark-petersen: just to clarify, is the picture you posted a region of higher-resolution in WC14? where is it exactly?

I feel like, as we go to higher and higher resolution refinement in MPAS, we may have to be even more careful than before in the mesh creation/init steps, and look at things like bathymetry, land mask placement, and channels in key HR regions, adding on top of what @proteanplanet has been doing with visualizing some of these things. (no criticism here, just a suggestion for future mesh/init-preparetion steps).

vanroekel commented 2 years ago

@milenaveneziani yes, what Griffies means is that the redi tensor (four terms) is reduced to just the horizontal diffusion component (term 1) near bathymetry.

vanroekel commented 2 years ago

My two cents on the bathymetry + smoothing is that I concur with @xylar it would be best to keep the GEBCO + Bedmachine for all meshes and smoothing should be pre mapping to MPAS meshes.

I'm very interested to see what @dengwirda's algorithm could do as well.

mark-petersen commented 2 years ago

@milenaveneziani, the image above is bottomDepth in the South China Sea around Malaysia, from the WC14:

dengwirda commented 2 years ago

Hi all, I'm very happy to throw my bathymetry remapping algorithm into the mix --- I'd say an advantage is that it does the DEM-to-MPAS remapping in a more consistent finite-volume way that behaves across VR meshes. I definitely agree with @xylar, @vanroekel that keeping the very high-res. GEBCO + BedMachine dataset is a good option, and becomes important at very high-res. near the coast (a'la ICoM!). I don't expect changing the bathymetry remap approach will solve any issues with redi non-monotonicity though --- the real bathymetry is indeed rough, and the fact that redi is non-monotone in these cases seems like more of a numerics question. I am trying to recall some recent MOM6 work that's been done in this space. Let me get back to you...

mark-petersen commented 2 years ago

These are great comments. I think the first priority is to alter the Redi implementation to avoid large non-monotonic behavior, even with rough bathymetry. Thanks @vanroekel for the Griffies 2005 reference, particularly:

we reduced neutral physics to horizontal diffusion at grid points adjacent to all boundaries.

When I first implemented and tested Redi we saw these non-monotonic oscillations, but I thought it would just be washed out by diffusion and advection. Now that we have a case where it comes about so clearly (WC14 with rough GEBCO and surface restoring off goes to 60C, but only along the coast!) it will be easier to test Redi alterations that solve this problem in our exact configuration.

@vanroekel, I agree that this is not the cause of the weak MOC because Redi was off in V1, but it could easily be the cause of odd behavior in surface fluxes, especially with high Redi values, observed by @ytakano3.

xylar commented 2 years ago

@vanroekel, just to add to the Redi discussion a bit, my concern is that the reason we don't see extreme temperatures in E3SM is because surface fluxes are taking care of these extrema, but hat this would mean that the fluxes themselves are significantly higher than they should be. The difficulty would be that it would be much easier to notice tracer extrema than unrealistically large fluxes, so we may simply not have noticed.

vanroekel commented 2 years ago

@xylar I completely agree with you. These results are concerning as far as e3sm goes. It is definitely something we should look at more carefully. One idea I had on how to diagnose this potentially is to do a pair of G-cases one with and one without redi and look at the evolution of the surface fluxes, especially near the coasts and the strong bathymetry. If others have ideas on how to look into it, I'm very interested to hear.

xylar commented 2 years ago

@vanroekel, the good news is that we don't have any E3SM simulations (except the new ones @ytakano3 is working on) with bathymetry as rough as what @mark-petersen showed here. So my guess is that the effect in combination with smooth bathymetry won't be too large. But I do agree that the pair of simulations you're suggesting would be really helpful.

xylar commented 2 years ago

In an offline discussion with @dengwirda, his code (currently here: https://github.com/dengwirda/rtopo-dem/tree/elev-profile) is not in a place yet where it could be used in compass or on the particular bathymetry we're using without some work. So I think that approach remain the goal for compass but not the short-term solution.

I propose, in addition to fixes to Redi, that we use the smoothing on the MPAS mesh that is currently available. I'll make a PR to turn that on for global meshes, and we can debate the particular parameter choices in that PR.

mark-petersen commented 2 years ago

I'll look more into the Redi limiting near boundaries. I'd like to draw up a few plans, make a new issue (probably on Ocean-Discussion) and discuss it with all of you before full implementation.

Because the WC14to60kmL60E3SMv2r03 (https://github.com/MPAS-Dev/MPAS-Model/pull/628) currently used by E3SM V2 was created with smooth ETOPO bathymetry, I propose that we pursue both Redi limiting and bathymetry smoothing, but do not alter the WC14 compass setup or E3SM initial condition files until we come to some consensus on the issues of Redi and bathymetry smoothing.

xylar commented 2 years ago

@mark-petersen, @ytakano3 badly needs to make progress on his Kuroshio mesh. So we need a resolution to this problem fairly quickly that allows him to move forward. This is why I think we need to fix both surface restoring and smoothing in compass right now, rather than waiting on Redi.

xylar commented 2 years ago

@mark-petersen and @dengwirda, hmm, it seems like you're right. We can't proceed until we have fixed the Redi problem. I am still testing #309 and #310 but I'm already seeing bad, bad signs with the EC30to60 mesh in that PR. The maximum temperature goes as high as 61.3 C before gradually damping back down.

I'm already doing 10 iterations of smoothing, which is about as aggressive as I would be comfortable with: So I would now agree that smoothing and surface restoring aren't even going to tide us over until Redi gets fixed.

xylar commented 2 years ago

Just a quick follow-up. In my testing on #309 and #310, the SORRM mesh reached a maximum temperature of nearly 150 C!!!

mark-petersen commented 2 years ago

OK, thanks @xylar for this information. Those increasing temperatures are troubling. We will make this a priority.

xylar commented 2 years ago

I think resolution of this issue is the only thing that speaks against releasing compass 1.0.0 in the next few days to weeks. If we can sort this out and show better dynamic-adjustment results, I'll be comfortable making the release.

MPAS-Dev / compass

SST in WC14 stand-alone spin-up warms to 60C #308