Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 5 forks source link

resample spatial bug #821

Open HansVRP opened 5 months ago

HansVRP commented 5 months ago

I am running extractions of multiple patches of Sentinel1 data using sample_by_feature and a GeoParquet of samples of 640m x 640m for spatial filtering. By default, the output NetCDF file has a spatial resolution of 10m/px and a size of 64x64 pixels.

As I want to have my tiles as an output of 32x32 pixels for a spatial resolution of 20m/px, I use the process:

Python cube = cube.resample_spatial(resolution=20.0)

However, the resulting NetCDF have the dimensions of 33x33 pixels, with a row and a column of only NaN values over the whole time dimension for both bands

An example job on CDSE: j-24070359250d4b9692766f0d25103517

jdries commented 3 months ago

The resampling is probably correct, but the cropping of the samples might be off, this happens here: https://github.com/Open-EO/openeo-geotrellis-extensions/blob/30627cf3b32b27ab26588f3048fe01a3756a524d/openeo-geotrellis/src/main/scala/org/openeo/geotrellis/netcdf/NetCDFRDDWriter.scala#L457

jdries commented 3 months ago

The input extents for backscatter seem nicely aligned to the 20m utm pixels:

'spatial_extent': {'west': 501300.0, 'south': 9494580.0, 'east': 527360.0, 'north': 9542720.0, 'crs': 'EPSG:32737'}, 
'global_extent': {'west': 501300.0, 'south': 9494580.0, 'east': 527360.0, 'north': 9542720.0, 'crs': 'EPSG:32737'}

but then the input to resample_spatial is in fact not aligned:

Reprojecting datacube with crs +proj=utm +zone=37 +south +datum=WGS84 +units=m +no_defs  and 
layout LayoutDefinition(
extent=Extent(xmin=501290.0, ymin=9494090.0, xmax=529450.0, ymax=9542730.0),
tileLayout=TileLayout(layoutCols=11, layoutRows=19, tileCols=256, tileRows=256)) to None and 20

Backscatter compute also indicates misalignment between extent and layout definition:

    Constructing TiledRasterLayer from numpy rdd, with metadata Metadata(
Bounds(minKey=SpaceTimeKey(col=0, row=0, instant=datetime.datetime(2016, 6, 8, 0, 0)), maxKey=SpaceTimeKey(col=10, row=18, instant=datetime.datetime(2018, 1, 21, 0, 0))), 
float32ud0, nan, 
+proj=utm +zone=37 +south +datum=WGS84 +units=m +no_defs , 
Extent(xmin=501300.0, ymin=9494580.0, xmax=527360.0, ymax=9542720.0), 
TileLayout(layoutCols=11, layoutRows=19, tileCols=256, tileRows=256), 
LayoutDefinition(extent=Extent(xmin=501290.0, ymin=9494090.0, xmax=529450.0, ymax=9542730.0), 
tileLayout=TileLayout(layoutCols=11, layoutRows=19, tileCols=256, tileRows=256)))
jdries commented 3 months ago

The output is now the expected 32x32 pixels. When computing the layoutdefinition, there was already some code that aligns it to multiples of the resolution, so the final cropping also could happen on exact pixel boundaries.

TODO: add a test that computes backscatter at lower resolution

The logging also shows that backscatter was indeed computed at 20m now: Resampling datacube not necessary, resolution already at 20

    Constructing TiledRasterLayer from numpy rdd, with metadata 
Metadata(Bounds(minKey=SpaceTimeKey(col=0, row=0, instant=datetime.datetime(2016, 6, 8, 0, 0)), maxKey=SpaceTimeKey(col=14, row=11, instant=datetime.datetime(2018, 1, 21, 0, 0))), float32ud0, nan, +proj=utm +zone=37 +south +datum=WGS84 +units=m +no_defs ,
 Extent(xmin=141980.0, ymin=9934120.0, xmax=215480.0, ymax=9994980.0), 
TileLayout(layoutCols=15, layoutRows=12, tileCols=256, tileRows=256), 
LayoutDefinition(
extent=Extent(xmin=141960.0, ymin=9933560.0, xmax=218760.0, ymax=9995000.0),
 tileLayout=TileLayout(layoutCols=15, layoutRows=12, tileCols=256, tileRows=256)))