Closed RondeauG closed 1 year ago
This is a partial fix to #94. Fixing xesmf.SpatialAverager
is harder than expected and might require a PR in xESMF
itself, but I didn't want to wait that long to implement cos-lat averaging.
Je me reprends, cos-lat
as implemented is indeed not usable with curvilinear grids. The thing is, cos(lat)
is a weight that varies in latitude and thus assumes all cells have the same area in °² (degrees squared). Which is not true for rotated pole for example.
I found this blog : https://towardsdatascience.com/the-correct-way-to-average-the-globe-92ceecd172b7
It shows a method approximating the cell area with a lat-dependent earth radius. It however uses a cell area approximation that only works for non-curvilinear grids. You could use cf_xarray
0.7.6 to compute 2D bounds and then cell areas (in °).
@RondeauG your code was indeed correct. But I think curvilinear grid support is essential as one of the principal use case is with MRCC5 data.
@RondeauG I found the challenge interesting and spent a bit too much time on this, but here are 3 ways of computing cell areas for the weighted average: https://gist.github.com/aulemahal/13c907cf2f7ccd3a84be34a7a009516c
We might not have pygeos
already, but it's an optional dependency of clisops
, so it might be worth adding it explicitly, it is much faster!
However, there's another issue with cf-xarray
... Computing the 2D bounds yields strange results when the cell crosses the first lon meridian. In the MRCC5 case, the domain corner crosses the +180 / -180 line and cell areas are super large because the corners are wrong.
See:
As a reference to compare to, I implemented the cos-lat method for 2D grids using ds.cf['latitude']
instead of ds.cf['X']
. Here's an example for RegCM4 / NAM-22:
I'll look into the links provided, though I feel like this might be another method entirely? As in, we'd have cos-lat
or SpatialAverager
or pygeos/ESMF/whatever
?
Also, with the changes, get_spatial_dims(ds)
is pretty useless. I could restore it within method=="mean"
as it was before, but I'm tempted to just get rid of mean
entirely since it does the wrong thing to begin with.
this might be another method entirely?
My "shapely" and "pygeos" functions are both based on the cos-lat approximation, so I would argue it's just a curvilinear-compatible version of cos-lat
.
The ESMF version is indeed another technique because it computes the cell area using other approximations.
Note for shapely: the newest version (2.0) now merges the functionality of pygeos
so there might be a way to code this idea without adding a new dependency.
I updated my gist with a function that works with shapely 2.0. It is indeed the exact same as with pygeos
and it's as fast.
@aulemahal @mccrayc This might be a lack of comprehension from my part on rotated grids, but let's say that I have this region:
The first part of Pascal's code (using the pygeos
version for example) produces this:
xr.DataArray(pygeos.area(pygeos.polygons(pygeos.linearrings(ds.lon_bounds, ds.lat_bounds))), dims=ds.lon.dims, coords=ds.lon.coords).plot()
np.cos(np.deg2rad(ds.lat))
creates more or less the opposite:
Meaning that when you multiply both, you get:
...which makes no intuitive sense to me? I would have expected weights to be curved, not linear like that?
No I think this is ok! You are in a Oblique Mercator projection, which projects the globe unto a cylinder. The projection being centered at y = 0, in the center of your plot, it makes sense to me that the cell area decreases with abs(y)
.
https://en.wikipedia.org/wiki/Oblique_Mercator_projection
Also, I found a way to compute areas (in meters) with pyproj
from lat and lon bounds. This was added to the gist, and gives the following:
Semi-related: As we currently have no bulletproof method, the choice makes sense. However, it seems to me that there's no reason to have such an array of methods for such a simple task. I guess that when we have a simple, light area-weighted mean method, we could remove everything else? Is there a use for a collection of imperfect algos ?
That's why I deprecate mean
, since cos-lat
is a superior version of the same thing. I could see an argument to drop interp_centroid
in favor of always using xesmf
too.
However, I think that cos-lat
and xesmf
(SpatialAverager) fundamentaly do different things. cos-lat
keeps the grid cells as is, while SpatialAverager
will break pixels into smaller parts to closely fit to a given polygon.
@mccrayc When you have time, a second review is needed since you "requested changes".
Pull Request Checklist:
pre-commit
hooks are installed/active in my local clone ($ pre-commit install
)number
) and pull request (:pull:number
) has been addedWhat kind of change does this PR introduce?
cos-lat
method toaggregate.spatial_mean()
mean
and added warnings toxesmf
_subset_file_coverage
to be more explicitcompute_indicators
when less than 3 timesteps were produced.Does this PR introduce a breaking change?
mean
and changes the name ofinterp_coord
tointerp_centroid
(still supportsinterp_coord
with a deprecation warning)Other information: