gjoseph92 / stackstac

Turn a STAC catalog into a dask-based xarray
https://stackstac.readthedocs.io
MIT License
238 stars 49 forks source link

stackstac.stack returns empty array when items have different CRS #150

Closed xjtang closed 2 years ago

xjtang commented 2 years ago

I'm trying to stack all the Landsat 8 images for a site (-67.008753, -9.96445, -65.615556, -8.57408) on Planetary Computer. The site happens to be in between two Landsat paths. As a result, when I search the images in the Planetary Computer catalog, I get images in different UTM zones, about half the images in epsg:32720 and about half in epsg:32719. I just picked 32720 but then an empty was returned.

bbox = (-67.008753, -9.96445, -65.615556, -8.57408)

search = catalog.search(
    intersects=tile,
    datetime="2020-01-01/2020-12-31",
    collections=["landsat-8-c2-l2"],
    limit=1000,
    query={'landsat:collection_category': {'eq': 'T1'}}
)

signed_items = [planetary_computer.sign(item).to_dict() for item in items]

data = (
    stackstac.stack(
        signed_items,
        chunksize=4096,
        resolution=30,
        epsg=32720,
        bounds_latlon=bbox
    )
)
Array   Chunk

Bytes 0 B 0 B Shape (0, 0, 5181, 5157) (0, 0, 4096, 4096) Count 10 Tasks 4 Chunks Type float64 numpy.ndarray

TomAugspurger commented 2 years ago

Here's a reproducer:

import pystac_client
import shapely.geometry
import stackstac

catalog = pystac_client.Client.open("https://planetarycomputer.microsoft.com/api/stac/v1")

bbox = (-67.008753, -9.96445, -65.615556, -8.57408)

tile = shapely.geometry.mapping(shapely.geometry.box(*bbox))

search = catalog.search(
    intersects=tile,
    datetime="2020-01-01/2020-12-31",
    collections=["landsat-8-c2-l2"],
    limit=1000,
    query={'landsat:collection_category': {'eq': 'T1'}}
)
items = search.get_all_items()

data = (
    stackstac.stack(
        items,
        chunksize=4096,
        resolution=30,
        epsg=32719,
        bounds_latlon=bbox
    )
)

data.shape

Which outputs (0, 0, 5170, 5145).

Here's a plot of the items and the bbox:

import geopandas
import folium

df = geopandas.GeoDataFrame.from_features(items.to_dict(), crs="epsg:4326")[["geometry", "proj:epsg"]]
m = df.explore(style_kwds={"fill": False}, column="proj:epsg")
layer = folium.GeoJson(tile)
layer.add_to(m)
m
image

If you leave off the bounds_latlon then things seem OK, so possibly some poor interaction between bounds_latlon and epsg when it needs to reproject.

xjtang commented 2 years ago

I think it's more of an issue with bounds_latlon, If I add query={'proj:epsg': {'eq': 32720}} to the code and limit the image s to one UTM zone, I still get an empty array whenever bounds_latlon is specified, even without specifying epsg.

xjtang commented 2 years ago

A quick update, seems like the problem is not really related to multiple UTM zones. The problem is unique to UTM South Zones.

TomAugspurger commented 2 years ago

Ah, that's likely https://github.com/stactools-packages/landsat/issues/12 then.

That's fixed in an upcoming release (it's deployed to our staging STAC API at planetarycomputer-staging.microsoft.com)

import pystac_client
import shapely.geometry
import stackstac

catalog = pystac_client.Client.open("https://planetarycomputer-staging.microsoft.com/api/stac/v1")

bbox = (-67.008753, -9.96445, -65.615556, -8.57408)

tile = shapely.geometry.mapping(shapely.geometry.box(*bbox))

search = catalog.search(
    intersects=tile,
    datetime="2020-01-01/2020-12-31",
    collections=["landsat-c2-l2"],
    limit=1000,
    query={'landsat:collection_category': {'eq': 'T1'}}
)
items = search.get_all_items()

data = (
    stackstac.stack(
        items,
        chunksize=4096,
        resolution=30,
        epsg=32719,
        # bounds_latlon=bbox
    )
)

data.shape

Note that the name of the collection is landsat-c2-l2, rather than landsat-8-c2-l2 since this includes results from all of Landsat Collection 2 (including Landsat 9).

That should be released to production later this week.

@gjoseph92 you can safely close this. The issues just with the STAC metadata (again)

gjoseph92 commented 2 years ago

Thanks for doing the research @TomAugspurger and @xjtang!

Just to clarify, the problem is that according to the STAC metadata, none of those items actually overlap with the bounding box?

I'm wondering if there should be any behavior change in stackstac to make this clearer?