developmentseed / titiler-xarray

[DEMO] TiTiler extension for xarray
MIT License
28 stars 9 forks source link

Add zonal statistics #44

Open abarciauskas-bgse opened 1 year ago

j08lue commented 2 months ago

rio-tiler has methods to extract subsets of pixel values from a dataset (part and point).

TiTiler exposes aggregations (mean, sum, min, max, median, quantiles, ...) of such parts as /statistics (see docs) and the point data extraction as /point.

We should offer the same API in TiTiler-Xarray. As @vincentsarago pointed out below, rio-tiler's XarrayReader already has part.

Better time series extraction?

However, we should keep in mind that Xarray datasets are often in fact a lot better suited to timeseries extraction, when data is already stored in contiguous chunks along the time dimension or can be virtualized as such with Zarr-ish solutions, such that a single request (to TiTiler) could in principle pull an entire timeseries. Can we leverage that or should we mimmic the low-level single-timestep solution of TiTiler?

Special treatment for grid cell area?

Another thing to consider: When datasets are stored on regular lat/lon grids, there are simple formulae for calculating grid cell area. This can be useful for

  1. calculating area-weighted statistics without reprojecting the data (TiTiler area-weighted stats involve reprojection into an equal-area coordinate reference system)
  2. converting per-area properties into actual quantity - for example a flux per unit area into a total flux

which are common operations done on (global) climate datasets.

Implementing accurate flux formulae could be a feature of statistics - but perhaps it is also better implemented in a separate algorithm / extension.

vincentsarago commented 2 months ago

rio-tiler has methods to extract subsets of pixel values from a dataset (part and point).

@j08lue I think you're linking the wrong methods but XarrayReader also have the part method https://github.com/cogeotiff/rio-tiler/blob/main/rio_tiler/io/xarray.py#L277-L352 but is missing the point one.

FYI this is how we do zonal-stats in titiler: https://github.com/developmentseed/titiler/blob/dd773230e1d906625f3cdf12933dfe699e952919/src/titiler/core/titiler/core/factory.py#L471-L525