Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
25 stars 4 forks source link

catch multipolygon errors #785

Closed HansVRP closed 2 weeks ago

HansVRP commented 1 month ago

Currently when a multipolygon job is ran, and one or more have an invalid geometry, or no data available, the entire job fails.

It would be better to catch these errors and ensure the user does get the output from the valid polygons.

In addition the user has to get a sign which polygons succeeded and which did not

JeroenVerstraelen commented 1 month ago

This issue is related to Cropsar. Specifically when there are multiple load_collections in one process graph.

HansVRP commented 1 month ago

j-24051737210441be92441beedd8ab5ce (60 polygons) j-240517e1cd064f9f88642efee3e8343a (6 polygons)

soxofaan commented 3 weeks ago

when a multipolygon job is ran, and one or more have an invalid geometry, or no data available, the entire job fails

as discussed, there are actually two independent and unrelated issues here:

HansVRP commented 3 weeks ago

In terms of the feature collection, I am not sure if it is indeed the best default behavior.

If you do have a feature collection consisting out op 9 valid and 1 invalid polygon. Will it start processing the valid ones until it hits the invalid one and then fail the entire job?

soxofaan commented 3 weeks ago

With a feature collection, the processing of the geometries typically happens in parallel, and the failure will/should even happen before actual processing is done, so it's not that you lose results that already have been calculated

jdries commented 3 weeks ago

Note that for the 'empty datacube' issue, the error is thrown by the sentinelhub data loader. This was also an issue for cropsar 2, until someone realized that it did not occur on CDSE, and that actually everything was running faster there anyway. So maybe question is also if we can perhaps run cropsar1 on cdse as well?

jdries commented 3 weeks ago

@HansVRP do we have a job id or stack trace for the case where there's actually an invalid polygon? (The two mentioned above seem to be about sentinelhub empty datacube issue.)

HansVRP commented 3 weeks ago

It does not seem to be a good practice to patch over the sentinel error. As for where to run it, we can discuss it with Dennis and Jurgen. Maybe only the credits will be an issue. Could we point them towards the aggregator such that they can use the terrascope credits for cdse?

@Pratichhya probably has some useful job-ids

Pratichhya commented 3 weeks ago

This was the one that failed: j-240511b94de2467eb8c0555a17cf49ea

jdries commented 3 weeks ago

Stack trace indicates that it's also coming from the sentinelhub specific part, so also for multipolygon error, testing on cdse can be relevant:

py4j.protocol.Py4JJavaError: An error occurred while calling o1681.datacube_seq.
: org.locationtech.jts.geom.TopologyException: found non-noded intersection between LINESTRING ( 658020.0000000001 5299279.999999999, 658050.0 5299279.999999999 ) and LINESTRING ( 658040.0000000001 5299270.0, 658040.0000000002 5299280.0 ) [ (658040.0000000002, 5299279.999999999, NaN) ]
    at org.locationtech.jts.noding.FastNodingValidator.checkValid(FastNodingValidator.java:140)
    at org.locationtech.jts.geomgraph.EdgeNodingValidator.checkValid(EdgeNodingValidator.java:81)
    at org.locationtech.jts.geomgraph.EdgeNodingValidator.checkValid(EdgeNodingValidator.java:46)
    at org.locationtech.jts.operation.overlay.OverlayOp.computeOverlay(OverlayOp.java:231)
    at org.locationtech.jts.operation.overlay.OverlayOp.getResultGeometry(OverlayOp.java:183)
    at org.locationtech.jts.operation.overlay.OverlayOp.overlayOp(OverlayOp.java:86)
    at org.locationtech.jts.operation.overlay.snap.SnapIfNeededOverlayOp.getResultGeometry(SnapIfNeededOverlayOp.java:75)
    at org.locationtech.jts.operation.overlay.snap.SnapIfNeededOverlayOp.overlayOp(SnapIfNeededOverlayOp.java:37)
    at org.locationtech.jts.geom.Geometry.union(Geometry.java:1388)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.unionActual(CascadedPolygonUnion.java:386)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.unionOptimized(CascadedPolygonUnion.java:309)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.unionSafe(CascadedPolygonUnion.java:290)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.binaryUnion(CascadedPolygonUnion.java:221)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.binaryUnion(CascadedPolygonUnion.java:226)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.binaryUnion(CascadedPolygonUnion.java:202)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.unionTree(CascadedPolygonUnion.java:148)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.reduceToGeometries(CascadedPolygonUnion.java:261)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.unionTree(CascadedPolygonUnion.java:146)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.reduceToGeometries(CascadedPolygonUnion.java:261)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.unionTree(CascadedPolygonUnion.java:146)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.reduceToGeometries(CascadedPolygonUnion.java:261)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.unionTree(CascadedPolygonUnion.java:146)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.union(CascadedPolygonUnion.java:136)
    at org.locationtech.jts.operation.union.CascadedPolygonUnion.union(CascadedPolygonUnion.java:68)
    at org.locationtech.jts.operation.union.UnaryUnionOp.union(UnaryUnionOp.java:216)
    at org.locationtech.jts.operation.union.UnaryUnionOp.union(UnaryUnionOp.java:113)
    at org.locationtech.jts.geom.Geometry.union(Geometry.java:1477)
    at org.openeo.geotrellissentinelhub.package$.dissolve(package.scala:147)
    at org.openeo.geotrellissentinelhub.package$.simplify(package.scala:150)
    at org.openeo.geotrellissentinelhub.PyramidFactory.datacube_seq(PyramidFactory.scala:336)
EmileSonneveld commented 3 weeks ago

image In the UDF inside CropSAR will apply a negative buffer (-10) on polygons that gives polygons that caues issues.

To avoid this, it would be nice to apply the negative buffering on the original raster using a convolutions. Negative buffering on polygons can be tricky.

Another way would be to apply negative buffering by converting the polygon to a Signed Distance Field first and then take an isoline on threshold -10

In the meanwhile, I'll see if it is possible clean the polygons, or give a more clear error.

soxofaan commented 3 weeks ago

@EmileSonneveld concerning negative buffering: I also remember discussions about this under internal ticket MKTP-256 you might want to check that one too for more background/context

EmileSonneveld commented 2 weeks ago

@HansVRP Is it possible to change the negative buffering in the UDF to 10.01 or 9.99? A negative buffer with 10 (and 20) cause invalid polygons, as the input mesh exists out of squares that are exactly 10 wide.

EmileSonneveld commented 2 weeks ago

In the following screenshot,

EmileSonneveld commented 2 weeks ago

Minimal example that triggers the same error:

process graph ```json { "process_graph": { "loadcollection2": { "process_id": "load_collection", "arguments": { "bands": [ "SCL" ], "id": "SENTINEL2_L2A_SENTINELHUB", "spatial_extent": null, "temporal_extent": null } }, "runudf2": { "process_id": "run_udf", "arguments": { "context": { "distance": 10 }, "data": { "from_node": "read1" }, "runtime": "Python", "udf": "from cropsar import prepare_s2_data_geometry\nfrom openeo.udf import UdfData, FeatureCollection\n \ndef fct_buffer(udf_data: UdfData):\n \n gdf = udf_data.feature_collection_list[0].data\n \n crs = 'epsg:4326'\n distance = udf_data.user_context.get('distance', 0)\n params = {'s2_clean':{'inarrowfieldbordersinmeter':distance}}\n for i,geom in gdf.iterrows():\n buffer = prepare_s2_data_geometry(geom['geometry'],params,crs)\n \n gdf.at[i,'geometry'] = buffer\n \n udf_data.feature_collection_list = [FeatureCollection(id=\"_\", data=gdf)]\n \n return udf_data" } }, "aggregatespatial1": { "process_id": "aggregate_spatial", "arguments": { "data": { "from_node": "loadcollection2" }, "geometries": { "from_node": "runudf2" }, "reducer": { "process_graph": { "mean1": { "arguments": { "data": { "from_parameter": "data" } }, "process_id": "mean", "result": true } } } }, "result": true }, "read1": { "process_id": "read_vector", "arguments": { "filename": "https://emilesonneveld.be/dropbox_proxy/work/VITO/VITO2024/invalid_multipolygon/extra_onlyDangerous.geojson" } } }, "parameters": [] } ```
EmileSonneveld commented 6 days ago

https://openeo-dev.vito.be/ shows now a more clear error. Soon it will be on openeo.vito.be / openeo.cloud too: openeo.rest.OpenEoApiError: [400] InvalidGeometry: UDF returned invalid polygons. This could be due to the input or the code. Invalid index(es): [0] (ref: r-240624f31f1b48b0b08d8c259a9cb8dd)