Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
25 stars 4 forks source link

NaN-padding with AGERA5 collection #754

Open soxofaan opened 2 months ago

soxofaan commented 2 months ago

While playing with AGERA5 data for https://github.com/Open-EO/openeo-gfmap/issues/90 I noticed weird NaN padding in some cases

reproduction code:

connection = openeo.connect("openeo.vito.be").authenticate_oidc()

# Download data
for (x, dx, y, dy) in [
    (4.00, 0.2, 51.00, 0.2),
    (4.00, 0.3, 51.00, 0.3),
    (4.00, 0.4, 51.00, 0.4),
    (4.00, 0.5, 51.00, 0.5),
    (4.00, 0.6, 51.00, 0.6),
]:
    path = f"result-{x:.2f}_{x+dx:.2f}-{y:.2f}_{y+dy:.2f}.nc"
    if not path.exists():
        cube = connection.load_collection(
            "AGERA5",
            temporal_extent=["2023-09-10", "2023-09-12"],
            spatial_extent={"west": x, "south": y, "east": x + dx, "north": y + dy},
            bands=["temperature-mean"]
        )
        print("downloading", path)
        cube.download(path)

# Visualize
for path in sorted(result_dir.glob("*4.00*.nc")):
    # Extract bbox from path
    m = re.search(r"([\d.]+)_([\d.]+)-([\d.]+)_([\d.]+)\.nc", path.name)
    x1, x2, y1, y2 = m.groups()

    ds = xarray.load_dataset(path)
    da = ds.drop_vars(k for k, v in ds.data_vars.items() if not v.coords).to_array("bands")

    da = da.isel(bands=0, t=0)

    fig, ax = plt.subplots(figsize=(2, 2), dpi=60)
    da.plot.imshow(ax=ax, vmin=29700, vmax=30000, cmap="plasma")
    ax.set_title(f"w={x1} e={x2} s={y1} n={y2}")
    ax.set_xlim(3.9, 4.6)
    ax.set_ylim(50.8, 51.6)

result: image

Notice how there is some NaN-padding (yellow stripes) for some extents

soxofaan commented 2 months ago

(note that the NaN is actually value 65535 in the uint16 data)

jdries commented 2 weeks ago

The resolution of the data is 0.1 degree, so the returned number of pixel corresponds with the requested size. The issue occurs due to a misalignment of 0.5 pixel between requested bbox and native pixel grid.

The logging does show that the input extent is realigned, and that the global extent is also correct, but the datacube extent is wrong in the sense that it is not aligned with the layoutdefinition.

On priority: less critical because it's an edge case, but relevant to check where this invalid cube extent comes from.

Some part of the logging that tells the story of how this happens:

Realigned input extent {'west': 4.0, 'south': 51.0, 'east': 4.2, 'north': 51.2, 'crs': 'EPSG:4326'} into {'west': 3.9499999999999886, 'east': 4.25, 'south': 50.95, 'north': 51.250000000000014, 'crs': 'EPSG:4326'}
Creating layer for AGERA5 with load params {'temporal_extent': ('2023-09-10', '2023-09-12'), 'spatial_extent': {'west': 4.0, 'south': 51.0, 'east': 4.2, 'north': 51.2, 'crs': 'EPSG:4326'}, 'global_extent': {'west': 3.9499999999999886, 'east': 4.25, 'south': 50.95, 'north': 51.250000000000014, 'crs': 'EPSG:4326'}, 'bands': ['temperature-mean'], 'properties': {}, 'aggregate_spatial_geometries': None, 'sar_backscatter': None, 'process_types': set(), 'custom_mask': {}, 'data_mask': None, 'target_crs': None, 'target_resolution': None, 'resample_method': 'near', 'pixel_buffer': None}
Created cube for with metadata TileLayerMetadata(uint16ud65535,LayoutDefinition(Extent(3.9499999999999886, 25.650000000000013, 29.54999999999999, 51.250000000000014),CellSize(0.1,0.1),1x1 tiles,256x256 pixels),Extent(4.0, 51.0, 4.2, 51.2),EPSG:4326,KeyBounds(SpaceTimeKey(0,0,1694304000000),SpaceTimeKey(0,0,1694390400000))) and partitioner Some(SpacePartitioner(KeyBounds(SpaceTimeKey(0,0,1694304000000),SpaceTimeKey(0,0,1694390400000))))
save_result format NETCDF with bounds Extent(xmin=4.0, ymin=51.0, xmax=4.2, ymax=51.2) and options {'batch_mode': True, 'file_metadata': {'title': '', 'description': '', 'institution': 'openEO platform - Geotrellis backend: 0.38.5a1'}}