corteva / rioxarray

geospatial xarray extension powered by rasterio
https://corteva.github.io/rioxarray
Other
504 stars 80 forks source link

Add a rio.fill.fillnodata-based nodata interpolation in addition to existing scipy-based #733

Open simonreise opened 5 months ago

simonreise commented 5 months ago

The issue is interpolate_na can be much slower than rio.fill.fillnodata

Simple example:

Reading B9 Landsat band

with rioxarray.open_rasterio(r"path/to/file/B9.tif", chunks = True, lock = True) as tif:
    b9 = tif.load().chunk('auto')

nodata = b9.rio.nodata

Then trying to fill nodata using pure rasterio

%%time
mask = np.where(b9.data == nodata, 0, 1)
b9 = b9.compute()
b9.data = rio.fill.fillnodata(b9.data, mask, max_search_distance=250)
b9 = b9.chunk('auto')

CPU times: total: 22.9 s Wall time: 1min 33s

1 min is a pretty reasonable time

But if we try to fill nodata using rioxarray interpolate_na

%%time
b9 = b9.persist()
b9 = b9.rio.interpolate_na()

CPU times: total: 11min 2s Wall time: 27min 25s

Computation is extremely slow (maybe I am doing something wrong?)

The first idea how rio.fill.fillnodata can work with xarray / dask arrays is using xarray.apply_ufunc.

%%time
mask = xarray.where(b9.data == nodata, 0, 1)
b9 = xarray.apply_ufunc(rio.fill.fillnodata, b9, mask, dask = 'allowed', keep_attrs = 'override', kwargs = {'max_search_distance': 250})

CPU times: total: 29.3 s Wall time: 1min 40s

%%time
mask = xarray.where(b9.data == nodata, 0, 1)
b9 = xarray.apply_ufunc(rio.fill.fillnodata, b9, mask, dask = 'parallelized', keep_attrs = 'override', kwargs = {'max_search_distance': 250})
b9 = b9.persist()

CPU times: total: 38 s Wall time: 1min 25s

But maybe there are better solutions.

snowman2 commented 5 months ago

I had thought about using that. Not opposed to using that as an engine for interpolation.