gjoseph92 / stackstac

Turn a STAC catalog into a dask-based xarray
https://stackstac.readthedocs.io
MIT License
238 stars 49 forks source link

Enable opening datasets with gcps even if no valid crs is present #182

Open weiji14 opened 1 year ago

weiji14 commented 1 year ago

Some datasets like Sentinel-1 GRD files don't have a coordinate reference system (crs), but they may have ground control points (gcps) and can still be opened by rasterio.

Test this branch using pip install git+https://github.com/weiji14/stackstac.git@open_vrt_with_gcps

Fixes #181.

weiji14 commented 1 year ago

Probably should write a regression test. @scottyhq, any good sample Sentinel-1 GRD files with a stable link that you know of?

scottyhq commented 1 year ago

Probably should write a regression test. @scottyhq, any good sample Sentinel-1 GRD files with a stable link that you know of?

GRDs are really big. xarray-sentinel has a minimized one here for testing https://github.com/bopen/xarray-sentinel/tree/main/tests/data/S1B_IW_GRDH_1SDV_20210401T052623_20210401T052648_026269_032297_ECC8.SAFE.

Or rioxarray just constructs a synthetic dataset with GCPs for a test: https://github.com/corteva/rioxarray/blob/bfd616594bb82886133b5f4b9356ed16d39014fc/test/integration/test_integration_rioxarray.py#L1053

gjoseph92 commented 1 year ago

@weiji14 did you get any further with this? Is this actually working as is now?

weiji14 commented 1 year ago

@weiji14 did you get any further with this? Is this actually working as is now?

Technically, I think the implementation is sound. But the unit test could be much improved. Ok if you want to merge this first and improve the unit test at a later date.

weiji14 commented 1 year ago

@weiji14 I pushed a change here to avoid mutating spec.vrt_params, since that object could possibly be shared with other dask tasks.

I'm not seeing any new commits in this PR. Could you try again?

To be clear, in your example in #181, you're still getting the purple/green/yellow output? That makes me think this might not be working correctly yet?

If I recall correctly, there's actually 2 things needed to fix the purple/green/yellow output.

  1. Getting the image plotted at (roughly) the correct location, and this requires reading the coordinate reference system (CRS) and ground control points (GCP) which this Pull Request has fixed (at least I think it's implemented correctly).
  2. Resampling/Interpolation - this is something this PR won't be handling, but which the user would still need to set to get a proper Sentinel-1 GRD image plotted (that is not purple/green/yellow).

I'm having trouble with the second part (resampling), which would require someone a bit smarter on the SAR data structure to get right (e.g. @scottyhq). I wish I had a proper end-to-end validated example where someone has taken a Sentinel-1 GRD image (from Microsoft Planetary Computer), read it properly with rasterio (with some set of parameters) and plotted it, but I'm missing that picture right now.

gjoseph92 commented 1 year ago

Oops maybe I forgot to push. I see them now in https://github.com/gjoseph92/stackstac/pull/182/files/1b3c47c6bd0d3ec9dd17b25dc8cb1c8496a131a7..d7b08203ab754295bf5644fa0a661df9fa94e8b9.

I'd be a little surprised to find that it's just a resampling issue. As long as the data types are right and we're not truncating floats to ints or something, I don't quite see why a different resampling method would change meaningful data into something completely meaningless-looking.

Furthermore, if you look at the example scene from https://github.com/gjoseph92/stackstac/issues/181 in PC explorer, you can see that what we're seeing from stackstac.show also has a different spatial extent, as well as looking meaningless. I suppose that could be a STAC metadata issue like https://github.com/gjoseph92/stackstac/issues/152 though (I haven't looked at the metadata at all).

I guess where I'm leaning right now is that if we can't get this to seem to work properly and return meaningful data, I'd rather raise a NotImplementedError if ds.crs is None than try to use GCPs but possibly return bad values.