JiaweiZhuang / xESMF

Universal Regridder for Geospatial Data
http://xesmf.readthedocs.io/
MIT License
269 stars 49 forks source link

NaN handling with nearest_s2d #65

Open Plantain opened 4 years ago

Plantain commented 4 years ago

I'm using nearest_s2d in combination with the add_matrix_NaNs' 'hack' documented in #15 , but I'm getting 'smearing' of the nearest value all the way to the border with nearest_s2d when I would expect it to behave like bilinear with the 'outside' values instead missing. Is this the intended behaviour? Can I work around this somehow by masking the output again?

ex left: nearest_s2d, right: bilinear

def add_matrix_NaNs(regridder): X = regridder.weights M = scipy.sparse.csr_matrix(X) num_nonzeros = np.diff(M.indptr) M[num_nonzeros == 0, 0] = np.NaN regridder.weights = scipy.sparse.coo_matrix(M)

JiaweiZhuang commented 4 years ago

This is an expected behavior for nearest neighbor look-up. The "smearing" you see is because outer boxes are mapped to their nearest inner cells (the shortest path is perpendicular to the inner domain boundary).

There is no clear definition of "valid domain" for nearest neighbor methods. It is more like a "look-up" than an "interpolation", as the destination cell doesn't need to be surrounded by ~4 source cells.

The easiest way to get the mask you want is probably using the masks from bilinear interpolation, which has the notation of "inner" and "outer" cells. Something like

outdata_nearest_s2d[np.isnan(outdata_bilinear)] = np.nan

here the bilinear regridder needs to have the nan hack in #15

Plantain commented 4 years ago

Thanks for the suggestion, I will try that. It will be quite an overhead in my use cas though as I have very large grids.

I wonder is the current behaviour actually useful for any usecase? I would suggest given xESMF only supports regular structured grids(?) the correct behaviour would instead be to match the nearest source if and only if the source point distance is less than the grid spacing of the source - else it should be set to NaN.

On Mon, 30 Sep 2019, 12:59 pm Jiawei Zhuang, notifications@github.com wrote:

This is expected behavior for nearest neighbor look-up. The "smearing" you see is because outer boxes are mapped to their nearest inner cells (the shortest path is perpendicular to the inner domain boundary).

There is no clear definition of "valid domain" for nearest neighbor methods. It is more like a "look-up" than a "interpolation", as the destination cell doesn't need to be surrounded by ~4 source cells.

The easiest way to get the mask you want is probably using the masks from bilinear interpolation, which has the notation of "inner" and "outer" cells. Something like

outdata_from_nearest_s2d[np.isnan(outdata_from_bilinear)] == np.nan

here the bilinear regridder needs to have the nan hack in #15 https://github.com/JiaweiZhuang/xESMF/issues/15

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/JiaweiZhuang/xESMF/issues/65?email_source=notifications&email_token=AACODFZIPVU6JDQOMKJIQZTQMIO5PA5CNFSM4I34LTPKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD76FDUY#issuecomment-536629715, or mute the thread https://github.com/notifications/unsubscribe-auth/AACODF5MBCDBORSHTGBXJU3QMIO5PANCNFSM4I34LTPA .

JiaweiZhuang commented 4 years ago

I have very large grids.

You might estimate a mask manually based on lat & lon values. For example, data[a * lon + b * lat + c >= 0] = np.nan would mask out one side of a boundary (assume straight line) depending on the choice of a, b, c.

if and only if the source point distance is less than the grid spacing of the source - else it should be set to NaN.

The vanilla nearest-neighbor search almost has no notion of "grid spacing", but I agree that this a useful feature. PRs are welcome & appreciated.

JiaweiZhuang commented 4 years ago

A more general case is "nearest neighbor with distance threshold". A grid point will be marked as nan if the shortest-path-to-source is longer than user-specified parameter max_distance. I don't think ESMF/ESMPy provide this kind of feature though. Not sure what's the best way to implement this without hacking the Fortran code...

JiaweiZhuang commented 4 years ago

As a marginally-related note, ESMF 8.0.0 has a NEAREST_IDAVG (inverse distance weighted average) extrapolation method that uses distance-to-source to scale output values. Ref: http://www.earthsystemmodeling.org/esmf_releases/last_built/esmpy_doc/html/regrid.html