Closed sgdecker closed 1 year ago
So while the docs for vorticity
currently say that dx/dy have to be float
(oops, that needs fixing), with the new finite differencing in 0.7, we can now handle vorticity for arbitrarily-spaced points. If you've tried to pass an array in for dx/dy, and it failed, that's a bug.
As far as the mapping/projection issues are concerned, that's still an open issue (covered in #174). So I'm a little unclear--is the map factor there to handle irregularly-spaced points or to handle issues from projection? or both?
OK, I see that vorticity
does compute something when I provide arrays for dx/dy.
Regarding the map factors, they are fundamentally there to handle issues from projection. One of those issues is that the data array is irregularly spaced, but another issue is that computing vorticity (more generally, the curl of a vector) in a quantitatively correct manner requires the map factors.
The current implementation of vorticity
is correct if the underlying data is truly in Cartesian coordinates (even if unequally spaced). I suppose output from a cloud model would fit this description.
However, when the data is not truly Cartesian (e.g., NWP output on some map projection), the third and fourth terms above are necessary for a quantitatively correct calculation. In many (most?) instances, the third and fourth terms will be quite small, certainly not noticeable qualitatively (say, when looking at contours), but one case where the difference is noticeable is when vorticity is computed from data on a lat/lon grid and examined in polar regions (where, due to convergence of the meridians, the dmx/dy term gets large).
For instance, compare the GEMPAK and MetPy-derived plots below (Python code is at https://gist.github.com/sgdecker/496f7ea7edd98b428ac3adab0b5e0842; GEMPAK is at https://gist.github.com/sgdecker/1a0bbf2e18dd0a2f9b4b02c83e07509b): Look closely and you'll see, for instance, that the 4 contour to the left (south, technically) of the vort max in the Arctic Ocean differs noticeably.
A mathematical way to see this is to note that vorticity in spherical coordinates is given by (with theta as colatitude): which expands to: The final term (which corresponds to the dmx/dy term in GEMPAK's general expression) blows up at the pole, but if we are 1 degree away, the cotangent is order 10^2, so using U ~ 10 m/s, and r ~ 10^7 m, the term is on the order of 10^-4 s^(-1).
The divergence and laplacian operators will have the same issue.
Thanks for the detailed analysis. It's what I figured, and definitely on our roadmap. Due to the nicely detailed analysis, let's leave this open for addressing the map factors, and we can close out #174.
(Putting the link to the NOAA writeup about map projections here: https://www.arl.noaa.gov/hysplit/cmapf-mapping-routines/)
Nice resource in the ARL guide. I do want to suggest caution in retaining 3D coordinates as much as possible, as I advocated in the Cartopy PlateCaree saga - while the ARL guide has nice routines for 2D, and that's fine for maps, it can make forward/inverse and mixed-coordinate plotting difficult.
Anyway, proj.4 has a 3D-aware forward / inverse, and a quick search turns up partial derivative support through the proj_factors function - maybe that's useful in ensuring we account for what @sgdecker points out above. This issue on OSGeo also seems related.
Since I mentioned divergence and the Laplacian operator, I went ahead and made plots of those as well to show the difference between GEMPAK and MetPy. Here is 500-mb divergence: Here is the Laplacian of 500-mb height:
Perusing the GEMPAK documentation, I found other calculations that also require taking derivatives of map factors to get absolutely correct results:
Okay, so I've been looking into this and have the following to report on absolute vorticity...
Here is the difference between GEMPAK and MetPy v0.10; note the very small range of color values
The average difference is 0.0051 /s. If the following term is added + (uwnd / (6375000 * units.meter)) * np.tan(np.deg2rad(lats[:, None]))
the difference is reduced by an order of magnitude to essentially zero.
The average difference is -0.00042. Note: The single strip of color at the poles is because MetPy's derivative does not assess vorticity at the top and bottom of the array. Excluding those rows does not appreciably affect the mean difference calculations.
The additive factor depends on the u-component of the wind and latitude and it is not a universal or 3D solution to the issue as it will has the potential to be slightly different for some of the calculations (e.g., laplacian). Since the list is "relatively" small, this can be done for the various cases and added, however with our current implementation it would require the addition of potentially two arguments. One argument signaling to use (or not use) the spherical calculation and another that contains the latitude array. This would not be a problem for absolute_vorticity
since it already requires the latitudes for the calculation of coriolis_parameter
, therefore we would only need to add the spherical designation. I would additionally promote the default use of spherical to beTrue
since that would be the much large use case for our current user base.
I haven't tried implementing this yet, but I think the general solution would be to accept the data CRS object as an optional argument, rather than more specific things such as latitude. If that argument is not provided, assume the grid is Cartesian, and set the map factors to one. Otherwise, use the CRS object to obtain the map factors via proj4 (as discussed by @deeplycloudy). In either case, use the formula in my original comment as the basis for the computation. The part that is unclear to me without actually trying this yet is how difficult obtaining the map factors from the CRS is.
This question is more related to the information needed for projection-aware deriviative-based calculations, rather than the implementation, but I still wanted to ask, since it may end up being useful for the implementation.
I've recently been planning out a collection of "dataset helpers" to flesh out any missing CF metadata in xarray Datasets, so that all of the MetPy-xarray integration functionality (projections, coordinates, units, and hopefully soon variable identification and the field solver) works as smoothly as possible. WRF output has been the motivating example (see https://github.com/Unidata/MetPy/issues/1004).
With that in mind, what would be the best set of information to standardize having in a DataArray and/or Dataset to ensure that these projection-aware calculations can be carried out? Is it sufficient to have
crs
?or am I missing something else?
The goal, I would think, would be to be able to calculate something like vorticity with a call such as
vorticity = mpcalc.vorticity(data['u_wind'], data['v_wind'])
and have everything else be obtained automatically from the DataArrays. This, if combined with some way of annotating the function and its arguments, would also really set the stage for the field solver.
Correct, my ideal is to have the CRS information, as well as coordinates like x or lon, attached to the data arrays. If we can't get map factors from the CRS...that would be bad.
While I had hoped that #1260 (which allows for easy calculation of both nominal and actual grid deltas from a parsed DataArray) would be enough to make this feasible for quick implementation for v1.0, #1275 raised the point about grid-relative vs. earth-relative winds that I had not fully considered with respect to this issue. So,
With all this extra complexity, though, I unfortunately don't see a way that this can feasibly be resolved with a stable API for v1.0 in the immediate future. So, will this issue need to be punted from v1.0, and instead, for now, just use a simple xarray input/output wrapper that fills in the default grid spacing (dx
and dy
) and coordinate-dependent parameters (f
and latitude
), and calculates these kinematic functions as they are now?
The equation way back in the first comment is valid for grid-relative u
and v
. If you used earth-relative winds on, say, a polar stereographic projection where North America is right-side up but Asia is upside down (so that grid-relative and earth-relative winds are nearly equal and opposite), I'd suspect adding the map factor terms would actually degrade the computation.
@jthielen I agree about punting from 1.0. I'll do some milestone work later, but it's pretty clear we're not going to get the 1.0 we want, but the one we deserve. :wink: We can talk more when you get into AMS. Let me know.
@jthielen I agree about punting from 1.0. I'll do some milestone work later, but it's pretty clear we're not going to get the 1.0 we want, but the one we deserve. We can talk more when you get into AMS. Let me know.
Sounds good. I'll be getting in tomorrow afternoon. We can chat via email if you want to find a more specific time, otherwise I'll plan on stopping by the Unidata table at the career fair that evening.
This notebook from @deeplycloudy may help shed some light on using PyProj for some complicated reprojections.
@sgdecker would you be available to join our biweekly collaborators call (#1684) for some live exploration of potential solutions we have been discussing lately surrounding this issue?
@kgoebber shared this notebook with us. We'll be using these two notebooks as a starting point for this discussion at our next dev call!
@dcamron Yes, I can join in on July 8.
For reference in advance of our discussion, I worked a bit more with @kgoebber's notebook and added a bunch of notes while trying to reconcile the two approaches to calculating curvilinear gradients. I wasn't entirely successful but hopefully my prose can help someone spot the way forward. I've forked and updated the notebook uploaded by @dcamron.
Using GEMPAK's value for the radius of the Earth marginally improves (by 0.02% or so, consistent with the radius differing in the fifth significant digit) the correspondence between MetPy and GEMPAK values in method one. I though switching to 32-bit arithmetic might further increase the match, but it does not. I am guessing the remaining difference comes from the loss of precision that occurred when generating the .out files.
Calculating the map factors as
m_x = 1 + (factors.parallel_scale - 1) / mpconst.earth_avg_radius.m
m_y = 1 + (factors.meridional_scale - 1) / mpconst.earth_avg_radius.m
brings things closer to GEMPAK in method two (RMS reduced from 0.03303 to 0.00108), but not nearly as close as method one (0.00002). This was a hunch on my part, but not having a good feel for what the pyproj meridional_scale
and parallel_scale
actually are yet, this transformation could be off base.
So based on today's dev call, here's the plan of action:
add_grid_arguments_from_xarray
(cc @jthielen @deeplycloudy @kgoebber )
Okay, so I've created an initial PR, please feel free to take it from there.
As I was working on this I delved into some NAM data and the vorticity calculation. Interestingly, the calculation matches what GEMPAK produces when no correction is made. The NAM data that I am using to test has grid-relative winds and does need to be converted to plot correctly, however, if you do the conversion for the calculation, it degrades the calculation. I haven't been able to unlock any combination of correction that improves over the base calculation that is already in MetPy. I haven't had time yet to create a refined notebook - I'll do that in the next day or two and update with the NAM example for others to peruse.
Here is a cleaned up notebook that I created to demonstrate what I have tried: https://gist.github.com/kgoebber/a5482e8e8c51396e80ffe3293696f57a
Makes sense that the spherical correction is no good for NAM 218, as that isn't a spherical grid. The fact that the last approach based on the map factors doesn't work so well is the mystery. I will be investigating further, but there is a nice big picture overview at https://gis.stackexchange.com/questions/122703/influence-of-the-scale-factor-on-the-projection (first answer) that I think has important info, but it is a matter of translating it into proj4 language to determine what is missing. My current hunch is that something involving the "nominal scale" needs to be accounted for.
My investigations are cataloged here: https://github.com/sgdecker/testing/blob/188ed819b4b464d5a3c1b83034435f0e6b9d800c/MapProjectionExploration.ipynb
I can now reproduce the GEMPAK calculation on Grid 218 using the proj4 map factors. I am not sure what I am doing differently from @kgoebber actually, but it works! Details are in this notebook, and the data files are in the repository.
Summary notebook demonstrating a working vorticity function for Grid 218 is here. Still need to revisit GFS global data and look at a stereographic grid.
Just a side question, since such good progress is being made on the implementation side: for my own understanding's sake (and probably also documentation in the eventual implementation), is there a good reference for how these map-factor-based formulations come about? The GEMPAK appendix's "left as an exercise to the reader" is not very satisfactory, especially since it requires a working knowledge of differential geometry.
It is at heart differential geometry, but you can see the general expression in the table here (good luck going through the derivations to get to that table!), where h1 is mx, h2 is my, h3 is 1 (no transformation in the vertical), v1 is u, v2 is v, and we only care about the vertical component (i = 3). The expression for vorticity that the GEMPAK documentation shows should emerge from the expression in the table corresponding to the curl of a vector field with the aforementioned substitutions.
Edit: Actually, we have orthogonal coordinates, so the info above, while correct, is more general than it needs to be. This page is better.
Thanks for this additional investigation @sgdecker! I was able to match your calculation by using just the parallel_scale
and meridional_scale
for the map factors. I was originally using the modification that you had explored in a previous notebook.
OK, good to hear. So now it is a matter of determining why my modifications are necessary for the GFS grid...
On a hunch I tried the following for the GFS output and got a great match with the GEMPAK calculation
xx, yy = np.meshgrid(gfs_data.lon.values, gfs_data.lat.values)
factors = Proj(data_var_u.metpy.pyproj_crs).get_factors(xx, yy)
m_x = factors.parallel_scale
m_y = factors.meridional_scale
dx, dy = mpcalc.lat_lon_grid_deltas(gfs_data.lon, gfs_data.lat)
dy_grid = dy[0, 0]
dx_grid = -dy_grid
dudy = mpcalc.first_derivative(data_var_u, delta=dy_grid, axis=-2)
dvdx = mpcalc.first_derivative(data_var_v, delta=dx_grid, axis=-1)
The key was in setting the "grid spacing" as the value at the equator (which I just based off of the latitude difference, which doesn't change, and then setting the dx value based on that). See the notebook below...so I think we might have a single way to deal with this now!
https://gist.github.com/kgoebber/8f855516115fa8cad74e957d2c440e92
Ahh, that makes sense, and I suppose using the GEMPAK radius would add a few more 9s to the correlation. So now the question is whether this will work just as seamlessly with a stereographic grid, and I think we'll have all the common cases covered.
This notebook has been expanded to include a polar stereographic case (Grid 104). New content begins roughly halfway through. Unfortunately, I can't make sense out of the proj4 map factors in this case, and they don't come close to reproducing the GEMPAK calculation. Interestingly, both the native GEMPAK output and the MetPy-based calculations break down in the upper-left and upper-right corners of this grid, so at least that part is being reproduced!
Please take a look and see if I am missing something. I intend to try a netCDF file next (as @kgoebber did for Grid 218).
I'm capturing the equations needed for the map factor calculations. These are from the GEMPAK appendix B2. The only one that is not directly in the appendix is the Q-vector, which I computed from the partial x and y vectors. May want to have someone check my "derivation".
Partial x-vector
Partial y-vector
Divergence
Inertial Advection
Shearing Deformation
Stretching Deformation
Vorticity
Q-Vector
Okay...had a few cycles this weekend and I think I have the spherical corrections working for GFS, NAM, NAM Stereographic, and GFS Stereographic projections. All but the global GFS work with the same code.
Process:
# Read NAM Data
nam_data = xr.open_dataset('namanl_218_20180308_0000_000.nc').metpy.parse_cf()
nam_data = nam_data.metpy.assign_latitude_longitude()
# Get u/v component data
data_var_u = nam_data['u-component_of_wind_isobaric'].metpy.sel(time=datetime(2018, 3, 8, 0),
vertical=500*units.hPa).metpy.quantify()
data_var_v = nam_data['v-component_of_wind_isobaric'].metpy.sel(time=datetime(2018, 3, 8, 0),
vertical=500*units.hPa).metpy.quantify()
# Start Spherical Correction for NAM Vorticity
# Get lat/lon values from dataset
xx, yy = (nam_data.longitude.values, nam_data.latitude.values)
# Use Proj object to get map factors
factors = Proj(data_var_u.metpy.pyproj_crs).get_factors(xx, yy)
# Isolate needed map factors
m_x = factors.parallel_scale
m_y = factors.meridional_scale
# Determine dx and dy grid spacing from x/y values
dx_grid = (data_var_u.x[1].values - data_var_u.x[0].values) * units.meters
dy_grid = (data_var_u.y[1].values - data_var_u.y[0].values) * units.meters
print(dy_grid)
# Compute main function derivatives
dudy = mpcalc.first_derivative(data_var_u, delta=dy_grid, axis=-2)
dvdx = mpcalc.first_derivative(data_var_v, delta=dx_grid, axis=-1)
# Use main function derivatives to calculate using Map Factors
zeta = (m_x * dvdx - m_y * dudy
- data_var_v * (m_x/m_y) * mpcalc.first_derivative(m_y, delta=dx_grid, axis=-1)
+ data_var_u * (m_y/m_x) * mpcalc.first_derivative(m_x, delta=dy_grid, axis=-2))
To see full details and some figures, take a look at the following gist notebook: https://gist.github.com/kgoebber/f479ee45ae83c42456721169804c82ec
In looking at @sgdecker previous notebook, I think the error that he was seeing occurs in the cell that was run 37th (e.g., In[37]
) as there was a call to an lcc_proj
variable where that should have been a stereographic projection. Looking at the Map Factors graphic then makes a little sense as it appears to make a cone with the highest map factors at the center top position.
Anyway, I think we have it solved (for vorticity at the least), now we have to decide how this would be best implemented within MetPy. Currently, we do assign lat/lon values to all data arrays before calculation, but then we also add the dx/dy based on the lat_lon_grid_deltas
, which we don't want to do in this case. There is also the difference between calculating the grid dx (dx_grid
) between the GFS, that natively brings lat/lon and everything else since it doesn't have projection x and y coordinates, and all of the others. As I get a few more cycles I will attempt to get versions of all of the equations affected (listed in a previous comment) working with map factors, unless someone else wants to take the baton from here.
The PR that I had previously started on this topic, #1963, should be closed and a new one completed as we'll need to go in a completely different direction.
In looking at @sgdecker previous notebook, I think the error that he was seeing occurs in the cell that was run 37th (e.g.,
In[37]
) as there was a call to anlcc_proj
variable where that should have been a stereographic projection. Looking at the Map Factors graphic then makes a little sense as it appears to make a cone with the highest map factors at the center top position.
Thanks, @kgoebber ! That was a bug for sure. Indeed, when I put grid104
there instead (which should have been the case all along), all the discrepancies in my notebook disappear.
I guess these functions will need conditionals that determine when to apply the map factors and when not to. Or, perhaps they set the map factors to 1 in the cases where they aren't needed so that the actual calculation is the same in all cases.
Okay, so I have compiled the vast majority of calculations affected by the spherical calculations. What this really all comes down to is whether we are doing a derivative on a vector quantity...so I have created a notebook that contains a vector_gradient function that computes the appropriate vector derivatives and then I use that throughout the notebook to calculate the various quantities.
I've primarily focused on the GFS calculations since I had them all done for the comparisons to GEMPAK calculations. I have also included the vorticity calculation for the NAM projection, NAM Stereographic projection, and GFS stereographic projection. There is a slight nuance to calculating the grid dx and dy between the lat/lon grid of the GFS and the other projected grids, which is captured in the vector_gradient function.
We can discuss potential implementation options on the next dev call.
Notebook is hosted at: https://gist.github.com/kgoebber/b3546977e55fef96d4b36f4ef573a788
@kgoebber Question on the most recent notebook: In the dx and dy calculation for the latitude_longitude grid, you have
earth_radius = 6371200 * units.meter
dx = 2*np.pi*earth_radius / u.lon.size
dy = -dx
However, this looks to only be valid when assuming a spherical earth with radius 6371.2 km, longitude with global extent, and equal spacing in latitude and longitude. Would it be reasonable to generalize this to the following?
geod = u.metpy.pyproj_crs.get_geod()
dx = units.Quantity(
geod.a * np.diff(u.metpy.longitude.metpy.unit_array.m_as('radian')),
'meter'
)
lat = u.metpy.latitude.metpy.unit_array.m_as('radian')
lon_meridian_diff = np.zeros(len(lat) - 1)
forward_az, _, dy = geod.inv(
lon_meridian_diff, lat[:-1], lon_meridian_diff, lat[1:], radians=True
)
dy[(forward_az < -90.) | (forward_az > 90.)] *= -1
dy = units.Quantity(dy, 'meter')
Essentially, no matter what subset or spacing of longitudes and latitudes, this calculates the dx
from longitude on the equator of the ellipsoid and dy
from latitude on the 0 degree meridian of the ellipsoid.
Ah yes, I have only been working with the global GFS, so the logic would break down for smaller subsets.
A quick test of your generalization was good except for the dy were not negative as they needed to be. All forward azimuths were 3.1415926 (Pi because of 1 degree spacing?), so don't know if there is some different logic with the forward_az or just multiplying dy by -1 is best.
I don't match the GEMPAK values as well, but that would be expected because we are not using the radius that they define. Despite that, it is different by only a very very small amount (what would be expected by using a different radius value.
A quick test of your generalization was good except for the dy were not negative as they needed to be. All forward azimuths were 3.1415926 (Pi because of 1 degree spacing?), so don't know if there is some different logic with the forward_az or just multiplying dy by -1 is best.
Yeah, not sure where that sign/direction error would be coming from. I'd be tempted to just reverse the directionality in the inv
call (swap lat[:-1]
and lat[1:]
) to take the latitude differences backwards, but that feels wrong without understanding why the forward differences are giving the wrong result.
I don't match the GEMPAK values as well, but that would be expected because we are not using the radius that they define. Despite that, it is different by only a very very small amount (what would be expected by using a different radius value.
Does specifying the matching ellipsoid fix it? I.e., replace
gfs_data = xr.open_dataset('gfs_test_data.nc').metpy.parse_cf()
with
gfs_data = xr.open_dataset('gfs_test_data.nc').metpy.assign_crs({
'grid_mapping_name': 'latitude_longitude',
'earth_radius': 6371200
})
Also, I hacked away at a possible implementation: https://github.com/jthielen/MetPy/commit/aac4e8e0454f229a4c1dcd4cc186057447d0b8fe. Right now,
vorticity
is the only modified calculation,But, it's at least getting the ideas I had down in code, so if anyone wanted to take advantage of it as a head start towards an actual PR, go ahead! If not, I can try further iterating on it, but not sure when I'd be able to do so next.
All checks out with the specific radius. Actually improves it marginally!
@kgoebber Looks great! Thanks for putting all this together. It looks like there is some difference between the GEMPAK and MetPy PV calculations that isn't attributable to the map projection. Is that a known discrepancy (however minor it may be)?
@dopplershift and I chat about this earlier today; I'm going to spin off @jthielen's commit above into a draft PR this week and begin the process of implementing it across functions, examining existing tests, and adding new tests. If you are already working on a PR, let me know!
Not sure if this was the impression of other folks or not, but I previously conceptualized these map factor corrections as only occurring in derivatives on vector quantities. https://github.com/Unidata/MetPy/issues/2356#issuecomment-1048324949 made me realize this does not seem to be the case; map factors need to be considered in any vector operation (so, also gradients of scalar fields, dot products, cross products, etc.). However, these corrections are much simpler (just coefficients on terms and they sometimes cancel out).
I keep on getting nerd sniped with this topic of "vector calculus on orthogonal coordinates," and while I don't have much new to show for it yet, one particular concern came up:
Do we need to delineate between vector component arguments that are scaled according to the covariant basis versus the normalized basis? (See https://en.wikipedia.org/wiki/Orthogonal_coordinates#Covariant_basis for discussion.) Is it safe to assume we will always have things in the normalized basis?
Do we need to delineate between vector component arguments that are scaled according to the covariant basis versus the normalized basis? (See https://en.wikipedia.org/wiki/Orthogonal_coordinates#Covariant_basis for discussion.) Is it safe to assume we will always have things in the normalized basis?
For what it's worth, the Pielke textbook says (p. 129 of the second edition) "By convention, the equations [the set of equations solved in an NWP model] are written in the contravariant form, using the covariant differentiation operation..." but that is in the context of a whole bunch of derivations. I think in practice all of the vector quantities MetPy would be dealing with are normalized (e.g., if the u-component of the wind is given as 20 m/s, it really is 20 m/s at that point).
On the topic of frequently having this problem in the back of my mind, I ended up writing up a rough proposal for addressing this "grid-correct vector calculus" problem on the data model level: https://github.com/pydata/xarray/discussions/6331. I doubt that all would be practical on the timeline we want to have this functionality implemented in MetPy, but, it could be a cool future goal?
I doubt that all would be practical on the timeline we want to have this functionality implemented in MetPy, but, it could be a cool future goal?
👍 to all of that
Some MetPy functions, such as gradient and advection, allow the user to supply arrays of deltas rather than a single value, but vorticity (and its relatives) does not. Vorticity is more complicated since derivatives of the "map factors" are also required, but it should be doable.
Here is the necessary computation according to the GEMPAK documentation (Appendix B2):
Since the map factors (m_x and m_y) are equivalent to the "nominal" grid spacing (\partial x and \partial y in the above equation) divided by the actual delta, an additional "nominal_delta" argument to vorticity would allow for the map factors to be computed. Since the WRF model provides the map factors in its output, allowing the map factors (and the nominal grid spacing) as optional arguments to vorticity in lieu of deltas would be useful as well.