gjoseph92 / stackstac

Turn a STAC catalog into a dask-based xarray
https://stackstac.readthedocs.io
MIT License
238 stars 49 forks source link

x and y shifted half-a-pixel at read #168

Closed ktiits closed 2 years ago

ktiits commented 2 years ago

There is difference of half a pixel while reading data to xarray using stackstac or xarray.open_dataset.

The original file has 20m pixel size, and pixel edges follow xxx00/20/40/80 m pattern. gdalinfo /vsicurl/https://pta.data.lit.fmi.fi/sen2/s2m_b01/s2m_sdr_20210511-20210520_b01_r20m.tif

...
Origin = (50200.000000000000000,7799840.000000000000000)
Pixel Size = (20.000000000000000,-20.000000000000000)
import rioxarray
import xarray
from pystac import Item
import stackstac

item = 'https://pta.data.lit.fmi.fi/stac/items/Sentinel-2_global_mosaic_dekadi/Sentinel-2_global_mosaic_dekadi_2021-05-11_2021-05-20.json'

stac_item = Item.from_file(item)

cube = stackstac.stack(
    items=[stac_item.to_dict()],    
    assets=['b02'],
    epsg=3067,
    resolution=20,
).squeeze()
cube

Here the x and y values (which I guess represent the middle of each pixel) are -172800., -172780., -172760. Following the original edge pattern, rather than being recalculated to represent pixel centers.

Alternative read with open_dataset

cube2 = xarray.open_dataset("/vsicurl/https://pta.data.lit.fmi.fi/sen2/s2m_b01/s2m_sdr_20210511-20210520_b01_r20m.tif")
cube2

Here the x and y values are shifted with half a pixel (=10m). 240010., 240030., 240050., 240070

The main problem comes at save time, when saving to GeoTiff with rioxarray the data read with stackstac has different pixel alignment than the original data. But based on the observings above, I would believe the problem source is already at read time..

scottyhq commented 2 years ago

@ktiits It looks like your STAC does not include the PROJ extension, so the output grid is approximated based on lat/lons even though your TIFS are in a projected CRS (EPSG:3067). See https://github.com/gjoseph92/stackstac/discussions/110#discussioncomment-1838446.

You can use the rio-stac library to quickly generate STAC metadata: rio stac -n b01 https://pta.data.lit.fmi.fi/sen2/s2m_b01/s2m_sdr_20210511-20210520_b01_r20m.tif > s2m_sdr_20210511-20210520_r20m.json

Once you have PROJ metadata in your STAC you can get the same coordinates as rioxarray with a few extra stackstac settings (snap_bounds=False, xy_coords='center') , see below:

import xarray
import numpy as np
import pystac
import stackstac

item = pystac.Item.from_file('s2m_sdr_20210511-20210520_r20m.json')

cube = stackstac.stack(
    items=item,    
    assets=['b01'],
    snap_bounds=False, #default=True
    xy_coords='center', #default='topleft' 
).squeeze()

url = "https://pta.data.lit.fmi.fi/sen2/s2m_b01/s2m_sdr_20210511-20210520_b01_r20m.tif"
cube3 = xarray.open_dataarray(url,
                            engine='rasterio'
                           ).squeeze()

np.testing.assert_equal(cube.x.values, cube3.x.values)
np.testing.assert_equal(cube.y.values, cube3.y.values)
gjoseph92 commented 2 years ago

Thanks for checking on this @scottyhq. Would xy_coords='center' alone solve things? (I haven't checked myself.)

I should add a note about that to https://github.com/gjoseph92/stackstac/issues/152, which maybe this is another instance of?

ktiits commented 2 years ago

Thanks a lot. xy_coords='center' seems to fix the issue. The STAC catalogue is provided by another organization, so my best option is just to inform them about this problem.

scottyhq commented 2 years ago

Thanks for checking on this @scottyhq. Would xy_coords='center' alone solve things?

Good point! It does in this case where the origin is divisible by the pixel spacing, but I copied that over from a separate example using a COG in EPSG:4326 where the origin and pixel spacing are commonly irrational floats:

rio info https://elevationeuwest.blob.core.windows.net/copernicus-dem/COP30_hh/Copernicus_DSM_COG_10_N28_00_E088_00_DEM.tif`
# "transform": [0.0002777777777777778, 0.0, 87.99986111111112, 0.0, -0.0002777777777777778, 29.000138888888888, 0.0, 0.0, 1.0]
href = 'https://elevationeuwest.blob.core.windows.net/copernicus-dem/COP30_hh/Copernicus_DSM_COG_10_N28_00_E088_00_DEM.tif'
da = xarray.open_dataarray(href, engine='rasterio').squeeze()

item = pystac.Item.from_file('https://planetarycomputer.microsoft.com/api/stac/v1/collections/cop-dem-glo-30/items/Copernicus_DSM_COG_10_N28_00_E088_00_DEM')

cube = stackstac.stack(
    items=item,    
    assets=['data'],
    snap_bounds=False, #default=True
    xy_coords='center', #default='topleft' 
).squeeze()

np.testing.assert_equal(da.x.values, cube.x.values)
np.testing.assert_equal(da.y.values, cube.y.values)