E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
334 stars 334 forks source link

Fix Redi mixing in ocean #3391

Closed mark-petersen closed 4 years ago

mark-petersen commented 4 years ago

Add capability to the Redi isopycnal mixing scheme, which is required for non-eddying resolutions. It includes:

  1. Change to triad formulation to compute the isopycnal slope.
  2. Linear expansion of density in T,S at each triad corner.
  3. Several bug fixes, including a missing factor in the vertical mixing term.
  4. Tapering Redi diffusion coefficient at top and bottom of each column.
  5. Add ability for Redi diffusion coefficient to be a 3D field, rather than just constant.
  6. Two new Redi verification test cases in the COMPASS suite.

According to Griffies et al 1998 this triad expansion of the slope calculation (my steps 1 and 2, his eqns 30-34) is total variance diminishing, and essential when using a nonlinear equation of state. It was used in MOM and POP. Though noise may still exist, it is much reduced.

This PR leaves Redi mixing off by default, but is non-BFB due to mixing changes. After tuning exercises we will have a later PR that turns Redi mixing on for low resolution, with the proper settings.

[NML] [non-BFB]

mark-petersen commented 4 years ago

The bulk of the code alterations are done. Testing is underway, so there will still be some edits and clean-up.

jonbob commented 4 years ago

@mark-petersen - please let me know when you think the Registry changes are done and I'll run the scripts to generate the bld and nml changes

mark-petersen commented 4 years ago

OK, editing namelist flags is done. The Registry file in the mpas-source submodule is correct, or you can download it here: https://github.com/MPAS-Dev/MPAS-Model/blob/952cd4d619a6bfd251603d26d4b1962aa9ef707c/src/core_ocean/Registry.xml If you want to double check, the very last flag I altered was changed to: config_mesoscale_eddy_isopycnal_slope_limit

jonbob commented 4 years ago

That last flag change was in what I got checking out the branch, so it all looks up-to-date. I'm running the scripts now to get the bld and nml changes

jonbob commented 4 years ago

@mark-petersen - changing so many variable names has made the automated script part a bit more complicated, so I'll have to run some of the issues by you, if that's OK:

Thanks

rljacob commented 4 years ago

Is there a switch to turn Redi mixing off and revert to previous answers?

mark-petersen commented 4 years ago

@rljacob yes, if we set config_use_Redi = .false. then it will be bfb with previous. Would you prefer that, and then turn it on in a following PR? I agree that provides more thorough testing.

rljacob commented 4 years ago

That is up to @golaz

mark-petersen commented 4 years ago

@jonbob I agree there are a lot of changes. Thanks for the help.

  • I assume we want to change all the grid-specific settings of config_use_standardGM to now use config_use_GM?

Correct. config_use_standardGM -> config_use_GM

  • we used to have the default for config_standardGM_tracer_kappa set to 1800 except for G-cases using oEC60to30v3wLI was 600.0 -- do we want those defaults the same but now for config_GM_kappa?

Correct. config_standardGM_tracer_kappa -> config_GM_kappa Let's leave all the settings in tact for each resolution. We may alter these later, after tuning exercises.

  • config_GM_Bolus_kappa_function was 'constant' by default -- do we want config_eddying_resolution_taper to be 'ramp' by default? 'constant'?

Let's leave it 'constant' right now so we have the option to test bfb on this PR, as stated just above. Once Redi is turned on, we could put it on 'ramp' for all runs and the tuning exercise.

  • is "N2_dependant" the correct spelling of the GM_closure variable?

That's a misspelling on my part, thanks. Changing it now...

  • in E3SM, config_disable_redi_k33 had previously been true by default -- do we now want it false?

Yes, false. Good thinking. The flag config_use_Redi should turn on all four terms in E3SM.

vanroekel commented 4 years ago

My two cents is that we put this PR in with config_Redi=.false. and then turn it on later. This would make integration testing more straightforward and unified.

jonbob commented 4 years ago

@mark-petersen - I committed my bld updates, but they should be tested with all the changes

jonbob commented 4 years ago

I'm running a test G-case with my head of this branch

mark-petersen commented 4 years ago

I am still propagating the namelist changes through the code. Should be done tomorrow morning.

jonbob commented 4 years ago

@mark-petersen - I checked out the current version of this branch and ran a SMS.T62_oEC60to30v3wLI.GMPAS-IAF.anvil_intel test. The cmake build failed but the old make one worked fine. The results of a five-day smoke test were BFB with one from before this update. I'll see if I can figure out the cmake problem...

mark-petersen commented 4 years ago

This should now work with cmake and remain bfb with previous.

milenaveneziani commented 4 years ago

@mark-petersen: is this ready to try, or should we wait for other fixes from the MPAS-O PR?

mark-petersen commented 4 years ago

I think everything is fixed. I went through the algorithm and code again and found the sign error. I need to rerun my validations, but we could start with a 5-day test with config_use_Redi = .true. in E3SM with EC60to30.

vanroekel commented 4 years ago

@mark-petersen and @milenaveneziani I think we should still try a stand alone 60to30 with only redi mixing enabled before doing too much in E3SM. But a 5-day simulation with redi on would be a good smoke test. From this thread it seems @jonbob may be trying this test already? Or maybe that is just a BFB with redi off?

jonbob commented 4 years ago

I'm running that test right now with an updated code base

jonbob commented 4 years ago

Passed an SMS.T62_oEC60to30v3.GMPAS-IAF.anvil_intel smoke test

mark-petersen commented 4 years ago

After the last set of revisions, the diffusion is moving the right way, but I'm seeing some noise: image and I'm looking into that now and redoing term-by-term verification. No use running a G case until the idealized results look good.

mark-petersen commented 4 years ago

If we want to add the three debug tracers to a G case, set: config_use_debugTracers = .true. and you can add initial tracer fields from this EC60to30 file on IC:

/lustre/scratch4/turquoise/.mdt3/mpeterse/runs/200109_redi_add_triads_EC60to30/ocean/global_ocean/EC60to30/spin_up/simulation/restarts/restart.0001-01-01_00.30.00.nc
ncdump -h restart.0001-01-01_00.30.00.nc|grep double|grep tracer
    double tracer1(Time, nCells, nVertLevels) ;
    double tracer2(Time, nCells, nVertLevels) ;
    double tracer3(Time, nCells, nVertLevels) ;

When debugTracers is on, it now has these three tracers.

mark-petersen commented 4 years ago

It turns out that the noise was coming from the T&S field, as I had a strange mix of terms on and off. If T&S are completely fixed in time, and the only tracer term is Redi, then it looks good qualitatively: image

vanroekel commented 4 years ago

@mark-petersen that's good news. Just so I understand, if you turn off all T&S terms but redi this still works (no noise in passive tracer) as Redi shouldn't do anything? Is that right?

mark-petersen commented 4 years ago

I'm almost done with all the testing of this Redi PR. Things are looking good, and I'll update and post all the details soon. There are some minor revisions since last Thursday.

One question for @vanroekel : To get bfb restarts, we need to either add indexBoundaryLayerDepth to the restart, or recompute it on start-up. That is because Redi Kappa is zero down to the BLD and then tapers to the full value below that. indexBoundaryLayerDepth is computed in cvmix, so the easy thing is to add it to the restart file. It is 2D, so it doesn't require much space. We currently have the real-valued BLD in the restart file.

vanroekel commented 4 years ago

@mark-petersen thank you for the update. For your question, I think I would prefer that indexBLD gets stored. Figuring out how to correctly call into CVMix in restart sounds more complicated and prone to error and as you say it is a 2D int field, so should not take up much space. It also makes sense to have this in the file with the full BLD.

milenaveneziani commented 4 years ago

@mark-petersen, FYI: last Thursday I started two G-cases on anvil: they are identical except that one has Redi on and one has it off. I have had some unexpected failures on a random year, but while the Redi-off case is now running fine and it's at year 31, the Redi-on case failed at year 27 with Nan detection errors. Nans are apparently found in the layer thickness, normal velocity and active tracers, at location: ~1degS, 6degE, which corresponds to an ocean point just off of Central Africa. This is what I could gather so far from the log files.

milenaveneziani commented 4 years ago

Apologies, forgot that those are radians not degrees.. Actual location is in the Southern Ocean, west of the Drake Passage. I am looking at some fields closely: will post a meaningful plot when I have it.

milenaveneziani commented 4 years ago

ok, so, this is what I could find so far: 1) I first looked at the 5-day instantaneous fields (from the highFrequency files) and did not notice anything unusual there; 2) I also noticed that the model had died during the previous month (July), for the same reason, but since the Redi-off case had also stopped at one point, I thought it was just some random node-related failure. So, I restarted the Redi-on case without investigating, and the nan-problem showed up again in August. 3) I plotted the output of the debug mpas_ocean file and there you can clearly see where the weird T and S values showed up at model time=0027-08-24_20:30:00 (Weddell Sea):

Screen Shot 2020-01-21 at 3 23 58 PM Screen Shot 2020-01-21 at 3 44 51 PM

Note that all values that are unrealistically negative have been associated to -9999 (dark blue dots). By doing that I also 'masked' the nan values associated with z>local_depth, sorry about that, but I added the ocean depth plot for reference.

mark-petersen commented 4 years ago

@milenaveneziani, @jonbob, @vanroekel, @maltrud My verification exercises are complete, as described on the MPAS PR: https://github.com/MPAS-Dev/MPAS-Model/pull/280#issuecomment-576520911. I rebased this E3SM PR with updated MPAS-Ocean. The commit last Thursday was preliminary, and those Redi results are not valid. I am confident that we can proceed with this PR for global testing now.

Here is a summary of where this PR stands:

  1. Verification against polynomial solutions shows convergence.
  2. Qualitative idealized tests show expected angle and spreading. Note: a. With Redi on T&S with constant slopes, T&S remain nearly constant, as expected. This is true with both linear and JM EOS. b. Global min/max of test tracers is not strictly bounded, but remains reasonable for small slopes. For example, with isopycnal slope of 0.001, initial min/max of [1,2] goes to [0.9,2.1] after one year.
  3. Global tests in MPAS-Ocean stand alone with Redi on show: a. Global sum of all tracers are conserved to machine precision when surface forcing is off. b. In one year test, tracers are not strictly bounded, as expected with Redi. In practice, min S, max S, max T remain bounded. Min T drifts below -1.8, but slowly (global min to -6 over a year). This is a consequence of Redi and is expected. It depends on Redi tapering at boundary, but ice freezing and advective mixing will limit it in E3SM cases.

We can now proceed with E3SM low-resolution tests. We need to consider the following:

  1. Keep an eye on global min/max of T, S, and debug tracers.
  2. For all these tests, we should use:
    config_use_debugTracers = .true.
    config_reset_debugTracers_near_surface = .true.
    config_reset_debugTracers_top_nLayers = 20

    and watch this tracer distribution in time. debugTracers are normally off in E3SM.

  3. Tapering of Redi Kappa may be set at surface, bottom, and below boundary layer. My guess is 3-5 layers, but this is something to be test. We might need more if spurious min/maxes occur there.
    config_Redi_surface_taper_layers = 3
    config_Redi_bottom_taper_layers = 3
    config_Redi_zero_in_boundary_layer = .true.
    config_Redi_BLD_taper_layers = 3
milenaveneziani commented 4 years ago

@mark-petersen: could you please explain these two options a bit more?:

config_Redi_zero_in_boundary_layer
config_Redi_BLD_taper_layers

how do they work in condjuction (or not) with the taper_layers options?

milenaveneziani commented 4 years ago

Talked with @mark-petersen in person about this and now things are clear to me. I was thinking perhaps we could have config_Redi_surface_taper_layers = 10, since the BLD options take care of how the tapering is done when BLD<upper 10 layers?

I will re-run the Redi-on case, while leaving the Redi-off case as is (Redi-off G-case should complete 50 years by tomorrow).

vanroekel commented 4 years ago

doesn't the bottom 10 layers seem large? This would be ~2000m.

milenaveneziani commented 4 years ago

no, this would be the top 10 layers. The bottom tapering is regulating by config_Redi_bottom_taper_layers and yes, I wouldn't make that more than 3.

mark-petersen commented 4 years ago

I would recommend running two simulations. One with low taper layers and one with high, and see how the min/max behaves.

Low taper layer numbers

    config_Redi_surface_taper_layers = 3
    config_Redi_bottom_taper_layers = 3
    config_Redi_zero_in_boundary_layer = .true.
    config_Redi_BLD_taper_layers = 3

High taper layer numbers

    config_Redi_surface_taper_layers = 10
    config_Redi_bottom_taper_layers = 5
    config_Redi_zero_in_boundary_layer = .true.
    config_Redi_BLD_taper_layers = 5

It would be better to use the lower taper values wherever possible, but I suspect we will see noise near boundaries, and need to go higher. Having two runs right away will give us some idea of the spread.

milenaveneziani commented 4 years ago

Seems to me that we can set config_Redi_BLD_taper_layers independently from the surface and bottom tapering, since this is tapering below the BLD. I think 3 layers tapering below the BLD is more than enough?

vanroekel commented 4 years ago

It looks like they can be set independently. I think (hope?) 3 is more than enough the for the BLD_taper layers, but I agree with @mark-petersen that we should see the bounds/impact of the large and weaker tapering.

milenaveneziani commented 4 years ago

I was talking with @mark-petersen and we agreed that we can always back off on the BLD tapering, once we see the results of the two tests above. So, I will start the tests, also adding 5-daily 3-d tracers in the high-frequency output, so that we can hopefully see what kind of noise we get.

jonbob commented 4 years ago

@mark-petersen - I did a series of smoke tests with a new checkout of this branch. I can confirm that the results are still bfb with master, when running with the default settings (redi off). I ran another test turning redi on (config_use_redi = .true.) but the results did not change because redi_kappa is 0 by default. I had to add config_redi_kappa = 600.0 to user_nl_mpaso to get redi to have an impact.

milenaveneziani commented 4 years ago

I have two G-cases waiting in the queue on anvil: one is with Low-tapering and the other one is with High-tapering options. Hopefully they'll start running soon. I'll let them go for 6 months and analyze results. I am saving 5-day snapshots of all tracers, MLD, and BLD, for diagnostics purposes. The Redi-off case is currently running: I'll stop it at year 50.

mark-petersen commented 4 years ago

@jonbob: The MPAS PR is merged, and MPAS-Dev/e3sm/develop is now updated. I just updated the tag and first comment here to be BFB. So this PR is ready to test on next and be merged.

mark-petersen commented 4 years ago

@vanroekel since this is bfb and has the triad formulation, could we merge it in? When you have a tapering scheme you are satisfied with, I think it would make sense to do an additional E3SM PR at that time. Then we can proceed with other ocean PRs in the mean time. If you agree, please approve this one.

vanroekel commented 4 years ago

@mark-petersen I've been running/testing this PR for awhile and think I have found a few bugs in the redi implementation outside the scope of tapering. I think I may be nearly ready to submit a PR and am running MPAS stand alone tests now. In the first month, I'm seeing no decreases in the minimum temperature and instead a slow increase. I would like to wait for further testing on those changes before merging Redi code to E3SM.

mark-petersen commented 4 years ago

Status: @vanroekel has added new tapering options here: https://github.com/MPAS-Dev/MPAS-Model/pull/445 We are still seeing temperature values drift below -1.8C, but are testing solutions to that now. We hope to be done next week, and will then be able to merge this in.

mark-petersen commented 4 years ago

Rebased. The flags were updated once in the previous commits, but Redi flags in MPAS-Ocean has been updated since then. @vanroekel you must already have an E3SM branch with updated flags and your preferred default values? Or does it make more sense for @jonbob to rerun scripts for the e3sm namelist files?

vanroekel commented 4 years ago

@mark-petersen I think we should definitely have @jonbob rerun scripts. My e3sm branch of this is hacked together and shouldn't be used for this PR.

jonbob commented 4 years ago

Agreed -- even if we got an old branch functional, I'd prefer to have the supporting scripts created by the autogeneration tools

jonbob commented 4 years ago

I'm going through the generated scripts and have just a couple of questions. There's a point where we resolve the differences between default namelist values between what's in the Registry and what's previously been the default. For config_Redi_kappa, Registry has 600 and of course it's been 0 by default. I'm guessing we want to set it to 600 now? Or should that be resolution-dependent? Redi is still off by default, but it would seem better if our best guess at the correct values were there so turning it on is simply a matter of flipping config_use_Redi...

jonbob commented 4 years ago

A similar question for config_GM_kappa. Currently the defaults look like:

<config_GM_kappa>1800.0</config_GM_kappa>
<config_GM_kappa ocn_forcing="datm_forced_restoring" ocn_grid="oEC60to30v3wLI">600.0</config_GM_kappa>
jonbob commented 4 years ago

And config_eddying_resolution_taper -- "ramp" or "none"?