Open HansVRP opened 5 months ago
The resampling is probably correct, but the cropping of the samples might be off, this happens here: https://github.com/Open-EO/openeo-geotrellis-extensions/blob/30627cf3b32b27ab26588f3048fe01a3756a524d/openeo-geotrellis/src/main/scala/org/openeo/geotrellis/netcdf/NetCDFRDDWriter.scala#L457
The input extents for backscatter seem nicely aligned to the 20m utm pixels:
'spatial_extent': {'west': 501300.0, 'south': 9494580.0, 'east': 527360.0, 'north': 9542720.0, 'crs': 'EPSG:32737'},
'global_extent': {'west': 501300.0, 'south': 9494580.0, 'east': 527360.0, 'north': 9542720.0, 'crs': 'EPSG:32737'}
but then the input to resample_spatial is in fact not aligned:
Reprojecting datacube with crs +proj=utm +zone=37 +south +datum=WGS84 +units=m +no_defs and
layout LayoutDefinition(
extent=Extent(xmin=501290.0, ymin=9494090.0, xmax=529450.0, ymax=9542730.0),
tileLayout=TileLayout(layoutCols=11, layoutRows=19, tileCols=256, tileRows=256)) to None and 20
Backscatter compute also indicates misalignment between extent and layout definition:
Constructing TiledRasterLayer from numpy rdd, with metadata Metadata(
Bounds(minKey=SpaceTimeKey(col=0, row=0, instant=datetime.datetime(2016, 6, 8, 0, 0)), maxKey=SpaceTimeKey(col=10, row=18, instant=datetime.datetime(2018, 1, 21, 0, 0))),
float32ud0, nan,
+proj=utm +zone=37 +south +datum=WGS84 +units=m +no_defs ,
Extent(xmin=501300.0, ymin=9494580.0, xmax=527360.0, ymax=9542720.0),
TileLayout(layoutCols=11, layoutRows=19, tileCols=256, tileRows=256),
LayoutDefinition(extent=Extent(xmin=501290.0, ymin=9494090.0, xmax=529450.0, ymax=9542730.0),
tileLayout=TileLayout(layoutCols=11, layoutRows=19, tileCols=256, tileRows=256)))
The output is now the expected 32x32 pixels. When computing the layoutdefinition, there was already some code that aligns it to multiples of the resolution, so the final cropping also could happen on exact pixel boundaries.
TODO: add a test that computes backscatter at lower resolution
The logging also shows that backscatter was indeed computed at 20m now:
Resampling datacube not necessary, resolution already at 20
Constructing TiledRasterLayer from numpy rdd, with metadata
Metadata(Bounds(minKey=SpaceTimeKey(col=0, row=0, instant=datetime.datetime(2016, 6, 8, 0, 0)), maxKey=SpaceTimeKey(col=14, row=11, instant=datetime.datetime(2018, 1, 21, 0, 0))), float32ud0, nan, +proj=utm +zone=37 +south +datum=WGS84 +units=m +no_defs ,
Extent(xmin=141980.0, ymin=9934120.0, xmax=215480.0, ymax=9994980.0),
TileLayout(layoutCols=15, layoutRows=12, tileCols=256, tileRows=256),
LayoutDefinition(
extent=Extent(xmin=141960.0, ymin=9933560.0, xmax=218760.0, ymax=9995000.0),
tileLayout=TileLayout(layoutCols=15, layoutRows=12, tileCols=256, tileRows=256)))
I am running extractions of multiple patches of Sentinel1 data using sample_by_feature and a GeoParquet of samples of 640m x 640m for spatial filtering. By default, the output NetCDF file has a spatial resolution of 10m/px and a size of 64x64 pixels.
As I want to have my tiles as an output of 32x32 pixels for a spatial resolution of 20m/px, I use the process:
Python cube = cube.resample_spatial(resolution=20.0)
However, the resulting NetCDF have the dimensions of 33x33 pixels, with a row and a column of only NaN values over the whole time dimension for both bands
An example job on CDSE: j-24070359250d4b9692766f0d25103517