Open delgadom opened 6 years ago
@kemccusker here's my implementation of a spatial nearest neighbor function. it assumes that the spatial pattern of nearest neighbors shouldn't change with time (it interpolates cells that are always NaN to cells that have at least one non-NaN value).
thinking about putting this in climate toolbox. any requests?
@delgadom I think it would be great if we could add this to the climate toolbox. One edit that I needed to make this function run:
notnull_flag = (~stacked_isnull_flag).values
Cool!
If used ~globally, do we care about bias from using rectangular, latlon grids at higher |lat|? (And maybe I'm being too nerdy and this isn't really an issue for target use cases)
The goal is just to interpolate-out NaNs, right?
I think for now we don't need to worry too much about bias at higher latitudes.
Yep - the goal is just to interpolate out NaNs, because the interpolate_na
function in xarray only works on 1-d arrays and doesn't like a multiindex if you stack dims, so functionality outside of xarray is necessary. Scipy's cKDTree function has a tolerance option that its interpolate module doesn't have so that's the motivation for using it. I also considered switching cKDTree to BallTree but decided that was too nerdy (unless you have thoughts on this @brews?)
Do we need it to run fast/small and be super awesome? I wouldn't be surprised if someone already has balltree with haversine distance or something.
Edit: Mature, Grown-up Brewster says: Seriously though, Mike's solution might be good enough. I wouldn't sweat it unless its been shown to be a bottleneck.
that's actually why I considered switching to balltree because it has the haversine option.
But then got pressed for time so I stuck with this. this actually is pretty fast, I used it on global data and it was significantly faster than my 1-d solution that involved stacking and then interpolating.
Yeah these are all great caveats for this function for sure. We currently just use it to map near-coastal pixels to coastal pixels for areas with a landmask mismatch (e.g. comparing NASA/NEX and BEST). I'm not super worried about the error introduced from slightly too frequently grabbing values from cells above/below rather than left/right, and the intention is to explicitly prevent interpolating over large distances with the distance_upper_bound argument, so I think we're good. Nearest neighbor is waaay faster than anything messing with haversine distance.
prototype implementation