Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 5 forks source link

aggregate_spatial with points: falling of the edge #935

Open jdries opened 2 weeks ago

jdries commented 2 weeks ago

aggregate_spatial with points seems to allow points at the border to fall off Job example: j-241108dc1e1848248f81c211bdb731e0

The process graph without aggregate spatial illustrates the problem nicely. The dem is loaded in 4326, then resampled to utm, and seems to be clipped somewhere along the way.

Not sure yet if the solution is to simply use larger bboxes, or actually something happening when we merge the cubes...

{
  "process_graph": {
    "loadcollection1": {
      "arguments": {
        "bands": [
          "DEM"
        ],
        "id": "COPERNICUS_30",
        "spatial_extent": {
          "crs": "EPSG:4326",
          "east": 22.09127774348824,
          "north": 49.10838474348824,
          "south": 48.93044525651176,
          "west": 21.82136325651176
        },
        "temporal_extent": null
      },
      "process_id": "load_collection"
    },
    "loadstac1": {
      "arguments": {
        "spatial_extent": {
          "crs": "EPSG:4326",
          "east": 22.09127774348824,
          "north": 49.10838474348824,
          "south": 48.93044525651176,
          "west": 21.82136325651176
        },
        "url": "https://stac.openeo.vito.be/collections/wenr_features"
      },
      "process_id": "load_stac"
    },
    "mergecubes1": {
      "arguments": {
        "cube1": {
          "from_node": "resamplespatial1"
        },
        "cube2": {
          "from_node": "loadstac1"
        }
      },
      "process_id": "merge_cubes"
    },
    "reducedimension1": {
      "arguments": {
        "data": {
          "from_node": "loadcollection1"
        },
        "dimension": "t",
        "reducer": {
          "process_graph": {
            "last1": {
              "arguments": {
                "data": {
                  "from_parameter": "data"
                },
                "ignore_nodata": true
              },
              "process_id": "last",
              "result": true
            }
          }
        }
      },
      "process_id": "reduce_dimension"
    },
    "resamplespatial1": {
      "arguments": {
        "align": "upper-left",
        "data": {
          "from_node": "reducedimension1"
        },
        "method": "bilinear",
        "projection": 32634,
        "resolution": 10
      },
      "process_id": "resample_spatial"
    },
    "saveresult1": {
      "arguments": {
        "data": {
          "from_node": "mergecubes1"
        },
        "format": "GTIFF"
      },
      "process_id": "save_result",
      "result": true
    }
  }
}
jdries commented 1 week ago

A buffer of 10m is applied to geometry input, to ensure datacube is large enough: https://github.com/Open-EO/openeo-geopyspark-driver/blob/35b4db997fe4aa4fec49928dde0642f285d12ead/openeogeotrellis/utils.py#L231

This also seems to be confirmed by extents in the logs: point that ends up as 'null' are inside the loaded extent.

Also for global_extent, a 10m buffering is used: https://github.com/Open-EO/openeo-python-driver/blob/22f89cfaa30306115f5479bdfc8b9ea8ebd0f04e/openeo_driver/dry_run.py#L587

Important: the issue only occurs when merge_cubes is present! So the normal aggregate_spatial with points on dem + resampling seems to work just fine.

jdries commented 1 week ago

The wenr features are not resampled at load time, because merge_cubes is not seen as a resampling operation. Supporting that would for sure simplify matters in this case.

Confirmed this by adding an explicit resample_spatial for wenr: that also solves the DEM issue