negative weights in cube_to_target

jsbamboo commented 1 month ago

Description of the problem

In the experience of using the cube_to_target (mapping terr_height/PHIS from the base cube to target) for regionally refined meshes (RRM), we've encountered issues with large negative values for terr_target. => related to negative weights_all / weights_out / wt in cube_to_target.F90. => resulting from some dominant negative weights in remap.F90 (corresponding to the overlap area “omega_kl” between the target grid and the base cube grid in Lauritzen et al. 2015).

Lauritzen, P. H., Bacmeister, J. T., Callaghan, P. F., & Taylor, M. A. (2015). NCAR_Topo (v1. 0): NCAR global model topography generation software for unstructured grids. Geoscientific Model Development, 8(12), 3975-3986.

This generally happens when the dx of the target grid is comparable/smaller than the dx of the base cube:

cube3000 (3km) to a 800m California RRM
cube12000 (800m) to a 800m California RRM
cube12000 (800m) to a 100m California RRM

Temporary solutions / limiters

Initially, my stopgap solution to fix the negative PHIS was just to to write an NCL script to manually “correct” these large negative values by averaging the PHIS values of the surrounding grid points.

@mt5555 pointed out that we can add a simple non-negative limiter. Two options documented in the email: 1) ignore the negative weights but use the other weights (renormalized so they sum to 1). 2) if there are negative weights, just replace all weights by 1/N (where N is the number of source cells contributing).

I’ve tested the two methods by modifying remap.F90 and they both worked well.

Underlying reasons and solution

Finally, @PeterHjortLauritzen helped us to identify the underlying reasons for the negative weights and suggested the solution. The discussion in email is documented here:

Turn on the ldbg flag in cube_to_target.F90
When we test the problematic point or target index, it prints out each weights of weights_out/omega_kl and writes out there debug files: side_integral.dat, inner_nonconvex.dat, inner_integral.dat
We can plot the line integral from these debug files to see which line segment caused the problem and what’s its geometric characteristics are (see below).
It turns out that the exact integration in the epsilon case (line segment parallel to latitude - compute weights exactly) caused the problem. It thinks the line segment is parallel to the latitude (judging by dy<fuzzy_width), but it's not (just because the segment is too short). see more details in the discussion pasted below.
My tests using the solution of deactivating the exact integration for cube12000 -> 800m/100m CARRM worked well.

Summary of solutions

get rid of the exact integration branches
add a simple non-negative limiter for weights

jsbamboo commented 1 month ago

Discussion in the email:

===weights debug prints:

weights_out( 1 ,1), weights( 5 ,1) = 1.0509980834232285E-007 1.0509980834232285E-007 weights_out( 1 ,1), weights( 6 ,1) = -8.8581656182158517E-006 -8.9632654265581743E-006 weights_out( 1 ,1), weights( 14 ,1) = 7.0516063422007840E-011 8.8582361342792737E-006 weights_out( 2 ,1), weights( 1 ,1) = 8.9649366631406596E-007 8.9649366631406596E-007 weights_out( 2 ,1), weights( 9 ,1) = -1.4116530880303253E-005 -1.5013024546617319E-005 weights_out( 2 ,1), weights( 10 ,1) = -1.4568151528446161E-005 -4.5162064814290763E-007 weights_out( 2 ,1), weights( 11 ,1) = -1.5019816238143562E-005 -4.5166470969740055E-007 weights_out( 2 ,1), weights( 12 ,1) = 2.0893794756200756E-009 1.5021905617619182E-005 weights_out( 3 ,1), weights( 2 ,1) = 1.4125446408897337E-005 1.4125446408897337E-005 weights_out( 3 ,1), weights( 3 ,1) = 1.4576740853993954E-005 4.5129444509661786E-007 weights_out( 3 ,1), weights( 4 ,1) = 1.4922891465757215E-005 3.4615061176326182E-007 weights_out( 3 ,1), weights( 7 ,1) = 8.8666163256907899E-006 -6.0562751400664260E-006 weights_out( 3 ,1), weights( 8 ,1) = 8.8560315021176001E-006 -1.0584823573189794E-008 weights_out( 3 ,1), weights( 13 ,1) = -2.2046321616736293E-009 -8.8582361342792737E-006

@PeterHjortLauritzen ’s explanation and our tests for the calculation of weights: ===Reply 1 from PeterL The algorithm is illustrated in Figures 3,4,5,6 in attached paper (Lauritzen et al. 2010) and equation 9 is the line-integral which should just end up being the area of the overlap. Please note that the line-integrals for lines parallel to y-axis (in gnomonic projection) are zero.

Lauritzen, P. H., Nair, R. D., & Ullrich, P. A. (2010). A conservative semi-Lagrangian multi-tracer transport scheme (CSLAM) on the cubed-sphere grid. Journal of Computational Physics, 229(5), 1401-1424.

===Reply 2 from PeterL 0.The code is correctly identifying the line-segments ... but one of the line-segments is so short that the weights are likely not correctly calculated ... 1.Please note that area 1 is approximately 2E-9 and area 3 (the small area that is yellow in your plot) is 7E-11. So area one is about 30 times larger than area 1. Seems reasonable. Area 2, however, looks a little smaller than area 1 but the sum of the weights is negative. 2.Note that the weights are of order 1E-6-ish so we are adding up a bunch of larger numbers to get a small number! 3.I carefully looked through all the line integrals and the algorithm captures all of them correctly but there is a very short line-segment near vertex 5 (see "problem" next to the arrow on my photograph - green lines are the source grid cell sides). So what is the problem: The problem is that vertex 5 is almost on top of a grid line. So when we start side integral 5 the algorithm includes a segment in the cell to the right of area (area 3) and computes weight zhang73 31: weights_out( 3 ,1), weights( 8 ,1) = 8.8560315021176001E-006 -1.0584823573189794E-008 that is added to area 3. Note that the segment is so short that the code thinks that the segment is parallel to the coordinate line which it is not: look for line segment parallel to latitude - compute weights exactly  zhang73 22: weights( 8 ,1) = -1.0584823573189794E-008 in the logfile. Since the area is on the order of 2E-9 adding a weight of -1E-8 drives the integral negative. If I omit the weight the area becomes 8.38E-9 which is way too large (but positive). Could you try and run the code without using analytic integration of line-segments that the code thinks are horizontal? In remamp.F90 change: else if (abs(yseg(1) -yseg(2))<fuzzy_width) then to else if (.false.) then … I am not sure this will solve the problem but I am pretty sure that if you manually remove vertex 5 (or displaced it) then you would not get a negative weight. This algorithm does not like vertices on top of grid lines (epsilon cases!).

===Results after deactivating the exact integration I changed “abs(yseg(1) -yseg(2))<fuzzy_width” with “.false.” in two places, one calculating “slope” in subroutine , another calculating “weights” in subroutine .

The results look reasonble! The weights_out (ind=3) changed from -2.2046321616736293E-009 to 1.8986300128590624E-009, by changing the weights of S8 (please see figures below) from -1.0584823573189794E-008 to -6.4815613463673699E-009.

The new results seem to be reasonable, which seems to be proportional to the line intergal from the figure below: the 1st weights is the yellow line integrals, 7.1e-11 the 2nd weights is the light blue line integrals, 2.1e-9 the 3rd weights is the dark blue line integrals, 1.9e-9

cube_to_target_debug

So, this line “else if (abs(yseg(1) -yseg(2))<fuzzy_width) then” in subroutine is unimportant as slope is not a return value. The same line in takes effect. But seems that for the real cases of “line segment parallel to latitude”, the way that don’t “compute weights exactly” won’t affect much in my test—as shown in the inner line from point “Add 2” to “Add 3” (S13, S14), right? The weights(13) is -8.8582361343315627E-006 vs. -8.8582361342792737E-006 for the previous vs. modified remap.F90. Have you seen any cases for “line segment parallel to latitude” that “compute weights exactly” and “do the general way—calculate slope and b” will give a notable different answers (and that’s why you have to keep this branch)?

===Reply 3 from PeterL My intention with computing line integrals using exact integration for segments parallel (within an epsilon) to the y-coordinate lines was that all the exact integrals should add up to the exact area of the source cubed-sphere grid. This is unimportant so I think we should get rid of the exact integration branches.

mt5555 commented 1 month ago

Excellent work! So did I understand correctly that the removal of the exact integration branches fix the problem? If that's the case, do you think we should still add the limiter for increased robustness in future cases? Or maybe it's better to fail in those possible future cases for force further more rigorous improvements.

jsbamboo commented 1 month ago

Thanks Mark! yes, so far, the removal of the exact integration branches fix the problem. I think it's fine not to add a limiter anymore.

PeterHjortLauritzen commented 1 month ago

@jsbamboo Thank you for adding all this info to the Github repo.

Would you be willing to open a PR for removing the exact integration in this code base?

jsbamboo commented 1 month ago

no problem! please check this PR #63

NCAR / Topo