Closed PeterDSteinberg closed 7 years ago
Worth noting that once PR https://github.com/bokeh/datashader/pull/375 is merged, datashader will rely on xarray's new rasterio backend. Since I just did the integration work there, I'll start adding comments to this issue to summarize my observations.
My high-level observations are summarized below:
[earthio/tif.py]() | xarray/backends/rasterio_.py | |
---|---|---|
Returns | Returns ElmStore, which references xr.DataArray objects | Returns xr.DataArray |
Inputs | Works for single files or directory of TIFs | Works for single files |
Operation | ElmStore wraps xr.DataArray objects which are created from rasterio.Dataset.read(.., window=..) |
xr.DataArray created from lazy-loaded RasterioArrayWrapper which uses rasterio.Dataset.read(..., window=..) |
Metadata | Preserves most of the TIF's metadata | Only stores crs attribute |
Memory usage | Reads data into memory depending on window kwarg |
Reads data into memory depending on chunks kwarg (see LazyIndexedArray on top of RasterioArrayWrapper) |
Additional features | Supports geotransform via earthio , and anything exposed by rasterio.open and rasterio.read methods |
Supports chunking, caching, and concurrent access |
Based on this, I think there's a lot of overlap - both are essentially achieving the same goal, although xarray seems to already be using dask to do a subset of what earthio does now without dask. Rather than add another xarray backend I think it would save us time to update tif.py
to use the xarray rasterio backend instead of the rasterio API (similar to what was done with datashader), but make improvements to the xarray backend so that more metadata attributes are exposed (notice that currently we can only see the crs
). This has the added benefit that we could keep the earthio geotransform logic decoupled from xarray, although there's also the drawback that we won't be able to use the rasterio API functions (namely the index
, and read(..., window=..)
functionality). The directory-reading functionality could be included in the logic layer above the xarray backend. What do you think @PeterDSteinberg ?
@gbrener Thank you for the summary - that helps. I agree this issue should become a modification of the xarray
backend for tif
. Can you raise an issue there, pointing back to your table of advantages/disadvantages above and the team there will probably go for the xarray changes you describe. I agree on your idea for separation of concerns (spatial logic in this repo but still making some changes in xarray
). The xarray
backend can stay geared towards 1 Tif file at a time, returning a DataArray
and the earthio
wrapper around it can call that on a directory of Tif files as is done now.
@PeterDSteinberg I just created an issue on xarray which requests the additional attributes alluded to in the table above (https://github.com/pydata/xarray/issues/1456) and opened a PR to address it.
The PR to add additional metadata attributes was merged into xarray. I created an interim dev release of xarray and hosted the packages on my channel, to tie over datashader development until xarray v0.9.7 comes out. The dev release can be installed with:
conda install -c gbrener xarray
The interim dev package version should be 0.9.6.dev2
.
Closing this as we are moving it to a documentation need (#36)
This PR 1260 in xarray added a rasterio backend. We need to see how this is similar or not to earthio/tif.py which uses
rasterio
for reading GeoTiff files.