Ouranosinc / xscen

A climate change scenario-building analysis framework.
https://xscen.readthedocs.io/
Apache License 2.0
15 stars 2 forks source link

cos-lat averaging #125

Closed RondeauG closed 1 year ago

RondeauG commented 1 year ago

Pull Request Checklist:

What kind of change does this PR introduce?

Does this PR introduce a breaking change?

Other information:

RondeauG commented 1 year ago

This is a partial fix to #94. Fixing xesmf.SpatialAverager is harder than expected and might require a PR in xESMF itself, but I didn't want to wait that long to implement cos-lat averaging.

aulemahal commented 1 year ago

Je me reprends, cos-lat as implemented is indeed not usable with curvilinear grids. The thing is, cos(lat) is a weight that varies in latitude and thus assumes all cells have the same area in °² (degrees squared). Which is not true for rotated pole for example.

I found this blog : https://towardsdatascience.com/the-correct-way-to-average-the-globe-92ceecd172b7 It shows a method approximating the cell area with a lat-dependent earth radius. It however uses a cell area approximation that only works for non-curvilinear grids. You could use cf_xarray 0.7.6 to compute 2D bounds and then cell areas (in °).

@RondeauG your code was indeed correct. But I think curvilinear grid support is essential as one of the principal use case is with MRCC5 data.

aulemahal commented 1 year ago

@RondeauG I found the challenge interesting and spent a bit too much time on this, but here are 3 ways of computing cell areas for the weighted average: https://gist.github.com/aulemahal/13c907cf2f7ccd3a84be34a7a009516c

We might not have pygeos already, but it's an optional dependency of clisops, so it might be worth adding it explicitly, it is much faster!

However, there's another issue with cf-xarray... Computing the 2D bounds yields strange results when the cell crosses the first lon meridian. In the MRCC5 case, the domain corner crosses the +180 / -180 line and cell areas are super large because the corners are wrong.

See: image

RondeauG commented 1 year ago

As a reference to compare to, I implemented the cos-lat method for 2D grids using ds.cf['latitude'] instead of ds.cf['X']. Here's an example for RegCM4 / NAM-22:

image

RondeauG commented 1 year ago

I'll look into the links provided, though I feel like this might be another method entirely? As in, we'd have cos-lat or SpatialAverager or pygeos/ESMF/whatever?

RondeauG commented 1 year ago

Also, with the changes, get_spatial_dims(ds) is pretty useless. I could restore it within method=="mean" as it was before, but I'm tempted to just get rid of mean entirely since it does the wrong thing to begin with.

aulemahal commented 1 year ago

this might be another method entirely?

My "shapely" and "pygeos" functions are both based on the cos-lat approximation, so I would argue it's just a curvilinear-compatible version of cos-lat.

The ESMF version is indeed another technique because it computes the cell area using other approximations.

Note for shapely: the newest version (2.0) now merges the functionality of pygeos so there might be a way to code this idea without adding a new dependency.

aulemahal commented 1 year ago

I updated my gist with a function that works with shapely 2.0. It is indeed the exact same as with pygeos and it's as fast.

RondeauG commented 1 year ago

@aulemahal @mccrayc This might be a lack of comprehension from my part on rotated grids, but let's say that I have this region:

image

The first part of Pascal's code (using the pygeos version for example) produces this: xr.DataArray(pygeos.area(pygeos.polygons(pygeos.linearrings(ds.lon_bounds, ds.lat_bounds))), dims=ds.lon.dims, coords=ds.lon.coords).plot()

image

np.cos(np.deg2rad(ds.lat)) creates more or less the opposite:

image

Meaning that when you multiply both, you get:

image

...which makes no intuitive sense to me? I would have expected weights to be curved, not linear like that?

aulemahal commented 1 year ago

No I think this is ok! You are in a Oblique Mercator projection, which projects the globe unto a cylinder. The projection being centered at y = 0, in the center of your plot, it makes sense to me that the cell area decreases with abs(y).

https://en.wikipedia.org/wiki/Oblique_Mercator_projection

Also, I found a way to compute areas (in meters) with pyproj from lat and lon bounds. This was added to the gist, and gives the following: image

RondeauG commented 1 year ago

Semi-related: As we currently have no bulletproof method, the choice makes sense. However, it seems to me that there's no reason to have such an array of methods for such a simple task. I guess that when we have a simple, light area-weighted mean method, we could remove everything else? Is there a use for a collection of imperfect algos ?

That's why I deprecate mean, since cos-lat is a superior version of the same thing. I could see an argument to drop interp_centroid in favor of always using xesmf too.

However, I think that cos-lat and xesmf (SpatialAverager) fundamentaly do different things. cos-lat keeps the grid cells as is, while SpatialAverager will break pixels into smaller parts to closely fit to a given polygon.

aulemahal commented 1 year ago

@mccrayc When you have time, a second review is needed since you "requested changes".