E3SM-Project / E3SM

Energy Exascale Earth System Model source code. NOTE: use "maint" branches for your work. Head of master is not validated.
https://docs.e3sm.org/E3SM
Other
348 stars 358 forks source link

Mass conservation with pg2 grids #3588

Closed beharrop closed 4 years ago

beharrop commented 4 years ago

While testing CO2 conservation with the v2alpha-clubbv2 branch, I noticed that the mass conservation is not maintained with the ne4pg2 grid, but is with ne4np4. I have been using CO2_FFF (fossil fuel flux CO2 tracer) to test for mass conservation because when there is no emissions file prescribed, the fluxes are exactly zero and conservation should be maintained (to machine level round off). To demonstrate the conservation issue, I have attached a figure showing the relative error in the global mean CO2_FFF from its starting value using ne4pg2 and ne4np4 grids. I have also copied the printouts of the global dry mass of that tracer from the gmean_mass calls for the two grids.

I am tagging a few people that I think might be interested @wlin7 , @susburrows , @kvcalvin , @kaizhangpnl , @whannah1 , @jonbob . These runs were done on Compy, intel compiler, and FC5AV1C-04P2 compset (with CAM_CONFIG_OPTS="-co2_cycle" added to enable CO2).

image

            ne4np4              ne4pg2

m=42name=CO2_FFF 5.6285784223748E+00 5.6282525870713E+00 m=42name=CO2_FFF 5.6285784223748E+00 5.6282525870669E+00 m=42name=CO2_FFF 5.6285784223748E+00 5.6282525870615E+00 m=42name=CO2_FFF 5.6285784223748E+00 5.6282525870628E+00 m=42name=CO2_FFF 5.6285784223748E+00 5.6282525870583E+00 m=42name=CO2_FFF 5.6285784223748E+00 5.6282525870570E+00 m=42name=CO2_FFF 5.6285784223748E+00 5.6282525870590E+00 m=42name=CO2_FFF 5.6285784223748E+00 5.6282525870572E+00 m=42name=CO2_FFF 5.6285784223748E+00 5.6282525870523E+00 m=42name=CO2_FFF 5.6285784223748E+00 5.6282525870430E+00

whannah1 commented 4 years ago

@ambrad will also be interested in this.

whannah1 commented 4 years ago

@beharrop I assume the CO2 data you are looking at is on the physics grid? It might be more appropriate to check for conservation on the dynamics grid. I put in some dynamics grid output for state variables when using the physgrid (DYN_T, DYN_OMEGA), so maybe we should output CO2 there as well and check for conservation.

beharrop commented 4 years ago

@whannah1 Thanks for adding @ambrad .

Yes, I was checking these on the physgrid. I will see about trying to output CO2 on the dyngrid and check if it conserved there.

mt5555 commented 4 years ago

I 2nd walter's suggestion of checking CO2 on the dynamics grid. In pg2, the physics are providing a tendency to the dynamics but the dynamics "owns" the state variables. If mass is conserved on the dynamics side, that would suggest an issue with the mapping algorithm (unlikely) or more likely maybe the precision issue with pg2 areas. That precision issue was recently fixed by @ambrad (not sure if that PR is on master yet or not).

ambrad commented 4 years ago

Thanks for the report, @beharrop. The area imprecision fix is now on master, as of Tuesday. But for ne4, I'm dubious that the imprecision would cause a relative error of ~1e-10. I'll analyze this simulation configuration. I might contact you for further information if I don't see how to reproduce this. Edit: Also, I'll report here when I have some findings.

ambrad commented 4 years ago

@beharrop one thing I see immediately, although it is orthogonal to this issue. v2 uses the theta-l dycore. That is not the default dycore. So you should run with ./xmlchange CAM_TARGET=theta-l to do the full analysis of the v2 configuration.

ambrad commented 4 years ago

I can reproduce the issue. The area imprecision fix does not change the ~1e-10 relative error. So now I'll investigate further.

susburrows commented 4 years ago

Thanks for looking into this, @ambrad

ambrad commented 4 years ago

I see the problem and will fix it. Currently, the remap algorithm is not formulated to conserve pdeldry*q; rather, it conserves pdel*q. One might then think that the wet column of gmean_mass should show exact conservation. But elsewhere in the dycore, a moisture adjustment is made. So neither column can currently show exact conservation.

The fix is to reformulate the conservation problem in the remap algorithm to conserve pdeldry*q rather than pdel*q. I will work on this with highest priority. Thanks again for the report, @beharrop.

ambrad commented 4 years ago

By the way, @beharrop, don't worry about measuring the tracer on the dynamics grid. Now that I understand the cause -- the remap algorithm is erroneously constraining pdel*q rather than pdeldry*q -- it's clear that neither grid will show conservation of pdeldry*q.

beharrop commented 4 years ago

Thanks @ambrad . I was just starting to look into the adding CO2 output for the dyngrid, but I guess I don't need to anymore :). I'm glad this is getting sorted before any v2 simulations get going.

ambrad commented 4 years ago

@beharrop, agreed. Please continue to report anything else you find amiss in the theta-sl-pg2 v2 configuration to me. Don't worry about making a tidy report.

Again, don't forget to include ./xmlchange CAM_TARGET=theta-l in your future simulations. To summarize, there are two changes to make to get the v2 config: Add pg2 to the resolution, and switch CAM_TARGET to theta-l.

beharrop commented 4 years ago

@ambrad thanks again. I'll let you know if I find anything.

susburrows commented 4 years ago

Awesome, thanks so much for the quick follow-up @ambrad !

ambrad commented 4 years ago

I found the problem and have a provisional fix, but I want to think about it more over the weekend and do more testing.

My initial hypothesis concerning constraining the wrong mass was incorrect: wet and dry q and dp are always consistent since their transformations end up canceling out in the products dp_k q_k, with k = wet or dry.

Instead, one step of a remap algorithm is inconsistent in the final 4-5 digits for two related quantities due to finite precision. I need to study further why this step was losing so many digits. In any case, I'm now computing one of the two quantities differently to get them to agree to ~1 ULP.

I'm working on a v2 theta-sl-pg2 performance PR and will add the fix to this problem to it. In the validation for that PR, I'll include CO2_FFF data for ne30pg2. Once I know why so many digits were lost, I'll recreate the problem in my physgrid remap unit tests so I catch this in the future; right now the mass conservation tests don't see this problem.

susburrows commented 4 years ago

Hi @ambrad , thanks for the update. Do you already have mass conservation unit testing implemented for the dynamics code?

ambrad commented 4 years ago

Yes, there are tests for conservation and other things in SL transport (dynamics) and physgrid. Some of SL transport's are in one of the HOMME standalone tests on the dashboard. Physgrid's internal tests are not currently on the dashboard, but I'm going to add that in this PR.

For physgrid remap, restating my previous comment, the internal test is currently insensitive to this problem because the internal dp on the FV grid is computed (in finite precision) differently in gllfvremap_*_mod and derived_physics in dp_coupling, and the test does not know this. So internally it sees conservation. The numerical difference leads to an artificial source or sink in CAM physics -- i.e., between remap steps -- in this case of relative magnitude ~1e-10. I'll fix this.

Both SL and physgrid remap tests look at mass conservation and othe properties within a transport or remap step, not across; that's what I mean by "internal." That's because in general there's no reason to expect conservation across steps because a tracer needs to be inert and have no source for that to be true. It would be good to put together a compset, perhaps based around this CO2_FFF-containing one, with special-cased code to track an intert tracer's properties across steps. I don't think I want to take that on in this PR, as that would delay getting things done, and there would likely be controversial code additions. Instead, I propose to start a new PR later where we can work through putting something like this together. Again, the high-level idea is to run an F-case (or maybe eventually fully coupled) with one or more inert tracers, and run special-case code on these inert tracers to test that certain properties hold over time.

susburrows commented 4 years ago

@ambrad that sounds fantastic. We have wanted to have testing along these lines for a while but nobody had the bandwidth to make this a priority so far. @jonbob has plans to implement some carbon mass budget diagnostics into the coupler, and @beharrop has worked on testing the atmospheric CO2 transport capability and on the composers related to that. There might be some value in a small meeting to coordinate efforts here.

ambrad commented 4 years ago

In the PR I'm working on, I improved state output in Homme so that we can get a good read on mass. Once the PR goes in, you will see output like this:

 qv( 42)=   0.560314161430643E-03  0.560314161430643E-03  0.693557414260850E+03

Tracer 42 in this case is CO2_FFF, and the numbers are tracer wet mixing ratio minimum and maximum and tracer global mass. The third number can be related to gmean_mass output, which looks like this:

 0:    before tphysbc DRY m=42name=CO2_FFF gavg dry, wet, min, max      5.6282525880491E+00      5.6415403802548E+00      5.6031416701634E-04      5.7091898125000E-04

as follows:

    (homme value) = 4 pi gravit (gmean_mass value)

Homme output frequency is controlled by the namelist variable statefreq. (Expensive) measurements are made and output every statefreq dynamics time steps.

ambrad commented 4 years ago

Here are some results from the PR in progress, comparing against theta with SL but np4 physics grid (theta-sl-np4) and preqx with Eulerian transport and np4 physics grid (preqx-eul-np4, the v1 config). Summary: theta-sl-np4 conserves ~10x better than preqx-eul-np4, and theta-sl-pg2 will conserve ~30x better than preqx-eul-np4. I'm going to propose to use the -co2_cycle build with a script based on the new qv( 42) output as the basic pieces for an EAM mass conservation test.

compset FC5AV1C-04P2
resolution ne4 or ne4pg2
./xmlchange --append CAM_CONFIG_OPTS="-co2_cycle"
./xmlchange RUN_STARTDATE=1950-02-14
1 year run

preqx-eul-np4
                        q min                   q max                 q mass
  first qv( 42)=   0.560314161430643E-03  0.560314161430643E-03  0.693597566121295E+03
  last  qv( 42)=   0.548138784095999E-03  0.564978645578975E-03  0.693597566121070E+03
  first 0:    before tphysbc DRY m=42name=CO2_FFF 5.6285784223749E+00
  last  0:    before tphysbc DRY m=42name=CO2_FFF 5.6285784223731E+00
  relative change: 3.24e-13

theta-sl-np4
  first qv( 42)=   0.560314161430643E-03  0.560314161430643E-03  0.693597566121295E+03
  last  qv( 42)=   0.540371116772462E-03  0.572181840098967E-03  0.693597566121319E+03
  first  0:    before tphysbc DRY m=42name=CO2_FFF 5.6285784223749E+00
  last   0:    before tphysbc DRY m=42name=CO2_FFF 5.6285784223751E+00
  relative change: -3.458478508424584e-14

theta-sl-pg2 PR in progress
  first qv( 42)=   0.560314161430643E-03  0.560314161430643E-03  0.693557414260850E+03
  last  qv( 42)=   0.550888508602139E-03  0.567021256587854E-03  0.693557414260843E+03
  first 0:    before tphysbc DRY m=42name=CO2_FFF 5.6282525880491E+00
  last  0:    before tphysbc DRY m=42name=CO2_FFF 5.6282525880491E+00
  relative change: 1.0162942236371494e-14