Fix FV physgrid for CRM

whannah1 commented 5 years ago

The initial FV physgrid implementation mapped tendencies to dynamics (except for tracers) and states to physics. This is fine for the traditional parameterizations, but causes a "checkerboard" bias pattern to emerge in the precipitation field when using the CRM. Mapping tendencies from dynamics solves the problem, but also required a lot of infrastructure changes to be robust. The model should still be BFB when run with the default np4 grid.

ambrad commented 5 years ago

Random, but I happened to notice that this line:

call initEdgeBuffer(par, edgebuf, elem, (3+pcnst)*nlev)

in fv_phys_to_dyn_topo should probably have just 1 (edited from nlev since this is for a surface variable only) as the final argument, and since edgebuf is local, there should be a freeEdgeBuffer call at the end.

mt5555 commented 5 years ago

this might be leftover from a merge conflict - I recently tried to get rid of all the calls to initedgebuffer() and freeedgebuffer(), replacing them with the newer interface that uses a single global edge buffer, "edge_g"

ambrad commented 5 years ago

Another point re: the topo routine. Why is par%dynproc a condition to do the DSS? I think, though am far from sure, that par%dynproc has to do with @AaronDonahue's parallel-in-time physics time-stepping approach. I believe that in most situations, dynproc is .false., implying that currently phis is not DSSed. Sorry if I'm totally wrong about all of this.

whannah1 commented 5 years ago

@ambrad, I often set dyn_npes to be less than the total number of MPI tasks to speed things up. The MMF scales pretty well past this limit, whereas the standard E3SM does not. This probably won't be as beneficial with pg2, especially when using the GPU, but I think we still need the check for par%dynproc to support those instances where the number of tasks is larger than dyn_npes.

ambrad commented 5 years ago

What happens if dynproc = .false., though? Will phis get DSSed elsewhere?

Edit: inidat.F90 does a DSS after topo is init'ed. But the DSS in the physgrid code is there to bring rspheremp and spheremp into the DSS. I'm still confused why this depends on dynproc.

whannah1 commented 5 years ago

If dynproc is false then any dyn structure, such as "elem", on that MPI task is not used by dynamics. Since the topography is loaded onto the physics grid and passed to the dynamics grid we only need the boundary exchange to operate on the MPI tasks that have valid dynamics grid data.

I guess you're implying that the DSS'd topography might need to get mapped back to the physics grid for the two to be consistent? I hadn't thought of that since I consider the topography field existing on the physics grid due to the way the orographic gravity wave parameterizations work.

ambrad commented 5 years ago

I think I just don't know how dynproc works. I'll look into it more. Sorry for the noise.

AaronDonahue commented 5 years ago

@ambrad , I'm happy to explain dynproc more with you if you want. But in summary, it came from a fix I put it before my parallel stuff. There was an option in homme to allow the number of dynamics procs to be less than the total given to the ATM, for example in the case when we might have more procs than dynamics elements, this was the dyn_npes option. But it was broken, and since I wanted to use it I first set out to fix it. So it's related to my work but not something I came up with.

Before dynproc there was a series of if statements that used if iam .lt. npes, which worked because not many people ever ran with dyn_npes<npes. I made the decision to switch to a logical with dynproc because there was another option dyn_npes_stride that allowed you to space the dynamics processors out over the full set of ATM procs (like every other processor, stride=2) so it wouldn't work to change it to if iam .lt. dyn_npes.

For now it is an under-utilized option, but I found that using the stride will bump performance by ~10% when a lot of cores are used, npes>>dyn_npes, and when we think of cases like @whannah1 mentioned there are now use cases for using more procs than elements.

even more summarized, par%dynproc just means I am a processor that does dynamics things, so it should be used whenever you are doing something that needs dynamics information. Because when par%dynproc=.false. some of the dynamics fields may not be populated or defined and/or have nonsense values.

ambrad commented 5 years ago

Thanks @AaronDonahue. I think I resolved my confusion. I have standalone-Homme tests for the high-order remap routines, and they were failing when I used par%dynproc in the topo routine. But I realized that's just because dynproc is uninitialized in standalone Homme. This commit https://github.com/ambrad/E3SM/commit/81d9ceeeadc6b0927e9b9ba52c2dba042fbb2abb fixes the issue, so now the high-order-remap topo routine matches what Walter does.

whannah1 commented 5 years ago

@mt5555, I'm looking back at this edge buffer question, and I still see initEdgeBuffer used in stepon.F90. Was your change to use a global edge buffer part of a recent E3SM PR? and will that break things when we bring it down into E3SM?

mt5555 commented 5 years ago

if you merge in from E3SM, wont it pick up all these changes? both interfaces work, so you can use the old interface, which requires a different edge buffer for depending on the total amount of data that will be transfered, or the new interface, were any pack/exchange/unpack can use the global "edge_g" buffer.

If the code uses an edge buffer that is too small, it should detect the problem and abort. If it uses an edge buffer that is too large (with the old interface), it will work, but it will be a little inefficient (extra communication).

whannah1 commented 5 years ago

@mt5555, ok, I'll switch to the new way of using the global buffer.

ambrad commented 5 years ago

I mocked up the current and proposed formulas to look at mass conservation in an element:

ie = 1
! Fill q_fv with random values.
do j = 1,nf
   do i = 1,nf
      q_fv(i,j) = one + half*(3*i + 1.7*j + 0.2*i*j)
   end do
end do
! Fill dp_gll with random values.
do j = 1,np
   do i = 1,np
      dp_gll(i,j) = one + 0.1*cos(real(i))*sin(real(j))
   end do
end do
! Map dp_gll to dp_fv by the usual subcell integration procedure.
call gfr_g2f_remapd(gfr, elem(ie)%metdet, gfr%fv_metdet(:,:,ie), dp_gll, dp_fv)
! Mimic current low-order pg2 procedure.
do j = 1,np
   j1 = 1
   if (j > 2) j1 = 2
   do i = 1,np
      i1 = 1
      if (i > 2) i1 = 2
      q_gll(i,j) = q_fv(i1,j1)
   end do
end do
a = sum(dp_fv(:nf,:nf)*gfr%w_ff*gfr%fv_metdet(:,:,ie)*q_fv(:nf,:nf)) ! Qdp_fv element mass
b = sum(dp_gll*elem(ie)%spheremp*q_gll) ! Qdp_gll element mass
print *,'current formula>',a,b
! Mimic proposed mass-conserving low-order pg2 procedure.
do j = 1,np
   j1 = 1
   if (j > 2) j1 = 2
   do i = 1,np
      i1 = 1
      if (i > 2) i1 = 2
      q_gll(i,j) = gfr%fv_metdet(i1,j1,ie)*gfr%w_ff(i1,j1)* & ! area of FV subcell (i1,j1)
           dp_fv(i1,j1)*q_fv(i1,j1)/ & ! Qdp_fv
           ! GLL air mass attributed to GLL node (i,j). Divide by 4 since the FV
           ! mass is split into 4 pieces, one for each of the 4 GLL nodes in
           ! this quadrant.
           (4*dp_gll(i,j)*elem(ie)%spheremp(i,j))
   end do
end do
a = sum(dp_fv(:nf,:nf)*gfr%w_ff*gfr%fv_metdet(:,:,ie)*q_fv(:nf,:nf)) ! Qdp_fv element mass
b = sum(dp_gll*elem(ie)%spheremp*q_gll) ! Qdp_gll element mass
print *,'mass conserving formula>',a,b

For the particular grid I used, I got

current formula>  6.129520379326387E-002  6.130064993699184E-002
mass conserving formula>  6.129520379326387E-002  6.129520379326387E-002

whannah1 commented 5 years ago

@ambrad, can you explain fv_metdet and w_ff and how they are calculated?

ambrad commented 5 years ago

fv_metdet is the Jacobian (determinant) of reference element to sphere, modified so that the sum of the spherical areas of the FV subcells sums to the spherical area of the element according to the GLL data (see below for more on this). To be precise, gfr%fv_metdet(:,:,ie) is elem(ie)%metdet remapped to the FV grid:

call gfr_g2f_remapd(gfr, elem(ie)%metdet, ones, ones, gfr%fv_metdet(:,:,ie))

Thus, we can compute the element area in two ways:

a = sum(elem(ie)%metdet * gfr%w_gg)
b = sum(gfr%fv_metdet(:,:,ie) * gfr%w_ff(:nf, :nf))

and a = b. Here, gfr%w_gg(i,j) is the product of the GLL weights and w_ff is the FV subcell area in the reference element.

In your code, I believe the expression inside the sum in the calculation of a above is equivalent to your area variables.

Tangential note: There are two reasons we don't compute the Jacobian determinant directly (the determinant of the Jacobian matrix computed by Dmap, evaluated at the FV reference subcell centroid). First, for the reference-to-sphere map, this would not result in a conserved area. Second, even if it did, HOMME modifies metdet (see alpha in cube_mod.F90) so that the sum of b above over all elements gives the area of the sphere exactly. To get around these two problems, we remap elem(ie)%metdet, giving a spherical-area-conserving Jacobian determinant.

whannah1 commented 5 years ago

ok, that makes sense, thanks. So I think the only difference here compared to what I tried recently was that factor of 4. I think I understand why my attempt failed, but I'm not sure I understand why adding the 4 works. That seems to suggest that 4*spheremp(i,j) for any GLL node is equal to the area of the overlaying FV subcell? Or am I thinking about it wrong?

whannah1 commented 5 years ago

@ambrad, one other question. In your code above, is "dp_fv" the sum or the average of dp_gll over the subcell?

ambrad commented 5 years ago

Re: 4: The formula first determines the mass in an FV subcell. In the current remap procedure, one FV subcell's mass is partitioned into mass for the four GLL nodes in the quadrant corresponding to the FV subcell. So there are two parts to the formula: a weighting factor to go from FV to GLL, and a factor 1/4 to take the mass resulting from the first part and split it among the four GLL nodes.
Average. dp_gll and dp_fv have the same dimensions, pressure.

ambrad commented 5 years ago

Two things:

First: Re: the ostensible DSS in fv_phys_to_dyn_topo: I realized why I couldn't get mine to work. There's a very subtle thing that happens in your code: the boundary exchange effectively does not happen. That's because in the line

    call edgeVpack(edgebuf,elem(ie)%state%phis(:,:),0,0,ie)

the first 0 in 0,0,ie means vlyr=0, which means no data are packed. What this section of code is actually doing is weighting phis by spheremp*rspheremp. This weighting then gets used in inidat.F90's DSS, which does not itself apply mass matrix weights. The bottom line is the MPI machinery part of that section of code should be removed, leaving only this, which is what I'm now doing in my version:

    do ie = nets,nete
       elem(ie)%state%phis = elem(ie)%state%phis*elem(ie)%spheremp*elem(ie)%rspheremp
    end do

With this change to my code, I can now use the high-order remap on the FV topo file as we want.

Second: In dyn_to_fv_phys, zs_tmp, which ultimately becomes phys_state(lchnk)%phis(icol), is remapped from the GLL values by subcell_integration. Are we sure we want this? Why should zs_tmp not actually be the raw data from the FV topo file, rather than that data remapped to GLL and then back to FV?

whannah1 commented 5 years ago

@ambrad, I'm not sure I understand your point about #1. Since the topo data is copied to all GLL "duplicate" edge nodes, the boundary exchange needs to happen to average these edge nodes to get a smooth topo field, and DSS is the only way to do this. Otherwise the topo field is not consistent at the edges. I remember you saying that you are loading the topo onto the dyn grid instead of the phys grid, so maybe something you did there avoids this problem?

As for zs_tmp, I'm not even sure why this variable gets passed back and forth, because it's not traditionally a prognostic quantity. Maybe this is important for when the coupled ocean has a variable volume and sea surface height? Either way, that's not for us to decide, so we should just keep mimicking what's done for the np4 grid.

ambrad commented 5 years ago

Re: 1. Yes, agreed. I'm just pointing out that your calls to edgeV(un)pack and bndry_exchangeV do nothing. Only the spheremp*rspheremp scaling matters, in preparation for the unweighted DSS in inidat. I solved my FV topo problem by recognizing you're not actually doing a DSS in that routine because vlyr=0.

Re: 2. We should provide phis, yes. But might it not be best to provide the phis values from the FV topo file, rather than the values that result from mapping from FV topo file values to GLL and then back to FV?

mt5555 commented 5 years ago

regarding PHIS, here's some comments which I've already shared with everyone, but just writing them down for the record:

optimal solution would be to provide a physgrid PHIS and a GLL node PHIS, each individually optimized. But this seems like too much work.

2nd best: I thought would be provide PHIS on the physgrid, and then map to the GLL nodes. But Andrew suspects that these maps may not be that great (for example, they require a DSS which can introduce high frequencies).

3nd best: provide PHIS on the GLL nodes, and then map to physgrid during initialization. This to me has a minor drawback in that the dycore requires smooth topography, and then this smooth topography would be further smoothed when mapping to the physgrid. It seems the topography on the physgrid with this approach might be slightly smoother then necessary.

mt5555 commented 5 years ago

Regarding the physgrid FV weights: everyone is probably aware of this, but for the record:

Ultimately I think this has to be the FV spherical area, since that is what is used for conservative mapping to/from the other components. With cubed_sphere_map=2, the GLL weights will sum to the spherical element area, so the area of the 4 physgrid volumes will also sum to the correct area.

With cubed_sphere_map=0, the GLL weights unfortunately do not sum to the spherical area and getting conservation in coupled mode will be tricky, and thus we plan to transition to cubed_sphere_map=2 for v2. (and in v1, RRM compsets are already using cubed_sphere_map=2).

ambrad commented 5 years ago

@mt5555, re: PHIS, 2nd best now works for me. This came down to the vlyr=0 confusion. High-order remap now is working with PHIS on the physgrid.

E3SM-Project / ACME-ECP

Fix FV physgrid for CRM #100