gjoseph92 / stackstac

Turn a STAC catalog into a dask-based xarray
https://stackstac.readthedocs.io
MIT License
232 stars 49 forks source link

Stackstac 0.5.0 produces bad pixel values for Sentinel-2 #241

Closed waltersdan closed 5 months ago

waltersdan commented 5 months ago

When querying Sentinel-2 images from AWS (Element-84), using stackstac 0.5.0 pixel values are significantly different than with 0.4.3. I believe 0.4.3 is giving the correct values. Inspection of some values makes it look like 0.5.0 is subtracting 1000 (maybe incorrectly applying a PB4 correction?) then dividing by 10,000.

Minimal reproducible example:

import stackstac
from pystac_client import Client
geom = {'type': 'Polygon',
 'coordinates': [[(-126.37, 50.43),
   (-126.35, 50.43),
   (-126.35, 50.45),
   (-126.37, 50.45),
   (-126.37, 50.43)]]}
start_date = '2020-07-01'
end_date = '2020-07-03'
url = "https://earth-search.aws.element84.com/v1"
cat = Client.open(url)
date_query = start_date+'/'+end_date
params = {
"intersects": geom,
"collections": ["sentinel-2-l2a"],
"datetime": date_query,}
search = cat.search(**params)
items = search.item_collection()
stack = stackstac.stack(
    items,
    assets=['blue'],
    bounds_latlon=[-126.37,50.43,-126.35,50.45],
    resolution=10,
    epsg=32609)
print(stack[0,0,0,0].to_numpy())

This code with stackstac 0.4.3 give 3176.0 (a reasonable DN for Sentinel-2). With stackstac 0.5.0, it gives 0.2176.

clausmichele commented 5 months ago

It should be due to the fact that the newest version applies automatically offset and scale to the data, so that you get the reflectance value and not the digital number. However, in this case there seems to be an issue, since the offset has been introduced with data starting from 2023 onwards.

Did you inspect the Items returned by your query checking the scale and offset values in them?

Related issues to the new processing baseline 4.0 which introduced the offset: https://github.com/Element84/earth-search/issues/9 https://github.com/Element84/earth-search/issues/23

waltersdan commented 5 months ago

Ah ok, I see! I did not see the addition of the default rescale=True to stack(). We had our own rescaling in place, so this was a breaking change.

Looking at the first item in the query above, it has s2:processing_baseline = 05.00 so the offset should be applied. The property earthsearch:boa_offset_applied = True. However, raster:bands offset=-0.1. So if stackstac is using the raster:bands property, then it looks like it is wrong for this item. You could use earthsearch:boa_offset_applied but I suppose that is unique the Element-84 STAC.

In our own processing, we have been using the earthsearch:boa_offset_applied. It does complicate things as you have to check both the processing baseline, and whether earthsearch:boa_offset_applied exists and is False for a given item. The property doesn't exist for some items, in which case it becomes None instead of False after calling stack().

clausmichele commented 5 months ago

Setting rescale=False gives you the correct DN value. So, as you understood, the issue is with the Element84 metadata.

import stackstac
from pystac_client import Client
geom = {'type': 'Polygon',
 'coordinates': [[(-126.37, 50.43),
   (-126.35, 50.43),
   (-126.35, 50.45),
   (-126.37, 50.45),
   (-126.37, 50.43)]]}
start_date = '2020-07-01'
end_date = '2020-07-03'
url = "https://earth-search.aws.element84.com/v1"
cat = Client.open(url)
date_query = start_date+'/'+end_date
params = {
"intersects": geom,
"collections": ["sentinel-2-l2a"],
"datetime": date_query,}
search = cat.search(**params)
items = search.item_collection()
stack = stackstac.stack(
    items,
    assets=['blue'],
    bounds_latlon=[-126.37,50.43,-126.35,50.45],
    resolution=10,
    epsg=32609,
    rescale=False
    )
print(stack[0,0,0,0].to_numpy())
> 3176.0
Berhinj commented 5 months ago

FYI Element 84 is aware about these confusions and announced aligning processing baseline and metadata in the coming months :)

waltersdan commented 5 months ago

Great, thanks for the replies! I'll close the issue