gjoseph92 / stackstac

Turn a STAC catalog into a dask-based xarray
https://stackstac.readthedocs.io
MIT License
232 stars 49 forks source link

Dimension names from cube:dimensions #236

Open clausmichele opened 7 months ago

clausmichele commented 7 months ago

It would be nice that, if at Collection or Item level the datacube extension is present, the provided dimension names would be reflected in the final returned xarray object. Currently, the dimension names are always the default ones:

Sample STAC Collection with datacube extension:

import json
import pystac
import pystac_client

url = "https://stac.eurac.edu/collections/SENTINEL2_L2A_SAMPLE"

stac_api = pystac_client.stac_api_io.StacApiIO()
stac_dict = json.loads(stac_api.read_text(url))
b_dim = None
t_dim = None
x_dim = None
y_dim = None
z_dim = None
if "cube:dimensions" in stac_dict:
    for dim in stac_dict["cube:dimensions"]:
        if stac_dict["cube:dimensions"][dim]["type"] == "bands":
            b_dim = dim
        if stac_dict["cube:dimensions"][dim]["type"] == "temporal":
            t_dim = dim
        if stac_dict["cube:dimensions"][dim]["type"] == "spatial":
            if stac_dict["cube:dimensions"][dim]["axis"] == "x":
                x_dim = dim
            if stac_dict["cube:dimensions"][dim]["axis"] == "y":
                y_dim = dim
            if stac_dict["cube:dimensions"][dim]["axis"] == "z":
                z_dim = dim
print(b_dim,t_dim,x_dim,y_dim,z_dim)

>>> bands t x y None

Result from stackstac:

import pystac_client
import stackstac

catalog_url = "https://stac.eurac.edu/"
collection = "SENTINEL2_L2A_SAMPLE"

catalog = pystac_client.Client.open(catalog_url)
query_params = {"collections": [collection]}

items = catalog.search(**query_params).item_collection()
data = stackstac.stack(items)
print(data.dims)

>>> ('time', 'band', 'y', 'x')

I understand that in the above example I'm passing STAC Items that do not contain the cube:dimensions field, which is provided only at Collection level. Would it make sense to give the option for using the naming convention from the STAC itself?

Same issue opened also for odc-stac, which has also default names: https://github.com/opendatacube/odc-stac/issues/136

Berhinj commented 7 months ago

@clausmichele I believe the reason stackstac and odc-stac are providing "time", "band", "y", "x" because these are the conventions from rasterio isn't it? And I'm afraid stackstac is designed to support rasterio supported file format