gjoseph92 / stackstac

Turn a STAC catalog into a dask-based xarray
https://stackstac.readthedocs.io
MIT License
232 stars 49 forks source link
cog dask geospatial rasterio stac xarray

StackSTAC

Documentation Status Binder

Turn a list of STAC items into a 4D xarray DataArray (dims: time, band, y, x), including reprojection to a common grid. The array is a lazy Dask array, so loading and processing the data in parallel—locally or on a cluster—is just a compute() call away.

For more information and examples, please see the documentation.

import stackstac
import pystac_client

URL = "https://earth-search.aws.element84.com/v1"
catalog = pystac_client.Client.open(URL)

stac_items = catalog.search(
    intersects=dict(type="Point", coordinates=[-105.78, 35.79]),
    collections=["sentinel-2-l2a"],
    datetime="2020-04-01/2020-05-01"
).get_all_items()

stack = stackstac.stack(stac_items)
print(stack)
<xarray.DataArray 'stackstac-fefccf3d6b2f9922dc658c114e79865b' (time: 13, band: 17, y: 10980, x: 10980)>
dask.array<fetch_raster_window, shape=(13, 17, 10980, 10980), dtype=float64, chunksize=(1, 1, 1024, 1024), chunktype=numpy.ndarray>
Coordinates: (12/24)
  * time                        (time) datetime64[ns] 2020-04-01T18:04:04 ......
    id                          (time) <U24 'S2B_13SDV_20200401_0_L2A' ... 'S...
  * band                        (band) <U8 'overview' 'visual' ... 'WVP' 'SCL'
  * x                           (x) float64 4e+05 4e+05 ... 5.097e+05 5.098e+05
  * y                           (y) float64 4e+06 4e+06 ... 3.89e+06 3.89e+06
    view:off_nadir              int64 0
    ...                          ...
    instruments                 <U3 'msi'
    created                     (time) <U24 '2020-09-05T06:23:47.836Z' ... '2...
    sentinel:sequence           <U1 '0'
    sentinel:grid_square        <U2 'DV'
    title                       (band) object None ... 'Scene Classification ...
    epsg                        int64 32613
Attributes:
    spec:        RasterSpec(epsg=32613, bounds=(399960.0, 3890220.0, 509760.0...
    crs:         epsg:32613
    transform:   | 10.00, 0.00, 399960.00|\n| 0.00,-10.00, 4000020.00|\n| 0.0...
    resolution:  10.0

Once in xarray form, many operations become easy. For example, we can compute a low-cloud weekly mean-NDVI timeseries:

lowcloud = stack[stack["eo:cloud_cover"] < 40]
nir, red = lowcloud.sel(band="B08"), lowcloud.sel(band="B04")
ndvi = (nir - red) / (nir + red)
weekly_ndvi = ndvi.resample(time="1w").mean(dim=("time", "x", "y")).rename("NDVI")
# Call `weekly_ndvi.compute()` to process ~25GiB of raster data in parallel. Might want a dask cluster for that!

Installation

pip install stackstac

Windows notes:

It's a good idea to use conda to handle installing rasterio on Windows. There's considerably more pain involved with GDAL-type installations using pip. Then pip install stackstac.

Things stackstac does for you:

Limitations:

Roadmap:

Short-term:

Long term (if anyone uses this thing):