gjoseph92 / stackstac

Turn a STAC catalog into a dask-based xarray
https://stackstac.readthedocs.io
MIT License
232 stars 49 forks source link

Scale offset from item asset #202

Closed RSchueder closed 1 year ago

RSchueder commented 1 year ago

This MR enables users to set rescale=True in stackstac.stack and have the scale and offset values be taken from the STAC Item with priority over the COG.

RSchueder commented 1 year ago

@gjoseph92 Thanks very much for the initial review. I was wondering if you had any additional suggestions for this MR? Thanks!

RSchueder commented 1 year ago

@gjoseph92 comments round 2 addressed, except for the conflict with #208, which I will probably address after it is merged to master (assuming you would like to do #208 first).

RSchueder commented 1 year ago

@gjoseph92 sounds like a good plan, I have a functional test in my company repo that should validate this use case. I will install from main, demonstrate, and get back to you.

RSchueder commented 1 year ago

@gjoseph92 When installing this branch in my system, the following test passes:

from common.eo.catalog.eo_catalog import EOCatalog
from common.eo.catalog.stac.config import Catalog
from common.geometry import polygon_from_bounds
from common.types import EOProductType, TimeRange

def test_scale_offset():
    aoi = [-97.8028, 49.8776, -97.1300, 50.0845]
    toi = TimeRange(value='2021-12-01/2022-03-01')
    bands = ["red"]
    aoi = polygon_from_bounds(aoi)

    client = EOCatalog(product_type=EOProductType('sentinel-2-l2a'), source=Catalog('skyfox'))
    stac_items = client.search(
        aoi=aoi,
        toi=toi,
        cloud_cover=100,
    )
    x_idx = 500
    y_idx = 500
    datacube = client.stack(stac_items[:1], bands, rescale=False)
    unscaled_pixel = datacube[0, 0, y_idx, x_idx].values
    assert unscaled_pixel == 9288

    datacube = client.stack(stac_items[:1], bands, rescale=True)
    scaled_pixel = datacube[0, 0, y_idx, x_idx].values
    assert scaled_pixel == (unscaled_pixel * 0.0001) - 0.1

You can ignore the method for obtaining the items that constitute the content of the test aside from recognizing that they are sentinel-2-l2a items that have scale and offset data in their metadata.

stackstac.stack is called in client.stack() in the following way

            stack = stackstac.stack(
                list(items),
                assets=query_band_names,
                epsg=modal_epsg_code,
                resolution=product_type.RESOLUTION,
                dtype=product_type.DTYPE if not rescale else 'float64',
                fill_value=product_type.FILL_VALUE,
                bounds_latlon=bounds_latlon,
                rescale=rescale
            )

If you want to run it yourself or encode it in the repo somehow I can serialize the items to omit the search component of the test.

What do you think of this?

RSchueder commented 1 year ago

@gjoseph92 I also wanted to point out that this is a breaking change for my existing codebase:

  File "/usr/local/lib/python3.10/dist-packages/stackstac/rio_reader.py", line 408, in read
    result *= scale
  File "/usr/local/lib/python3.10/dist-packages/numpy/ma/core.py", line 4285, in __imul__
    self._data.__imul__(np.where(self._mask, self.dtype.type(1),
numpy.core._exceptions._UFuncOutputCastingError: Cannot cast ufunc 'multiply' output from dtype('float64') to dtype('uint16') with casting rule 'same_kind'

I suspect that this is caused by me specifying dtype='uint16' in stackstac.stack, but the default for rescale is True, causing the float scale to be multiplied by the int result. What do you think about this behaviour?

gjoseph92 commented 1 year ago

What do you think of this?

Seems good! Thanks for testing it out.

I suspect that this is caused by me specifying dtype='uint16' in stackstac.stack, but the default for rescale is True, causing the float scale to be multiplied by the int result. What do you think about this behaviour?

Interesting. So right now, the doctoring for rescale says: https://github.com/gjoseph92/stackstac/blob/7836a363b2609216890b0e20e289f8ff71cf320d/stackstac/stack.py#L207-L209

However, now that we know the scaling factors ahead of time, I think we can actually do better than this. If you specify dtyple=uint16, and we know any scaling or offset factor is floating-point, we could raise an error about this during stack explaining the problem and telling you to either set rescale=False or change the dtype. That would be much more helpful behavior, especially since this error from NumPy is hard to interpret.

Any interest in opening a PR to do that?

RSchueder commented 1 year ago

@gjoseph92 the PR for the fix you describe above has been created here: https://github.com/gjoseph92/stackstac/pull/209

gjoseph92 commented 10 months ago

FYI @RSchueder, this is now released in 0.5.0. Thanks for your contribution!