Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
25 stars 4 forks source link

end-to-end check validation for Sentinel-2 collections #618

Closed jdries closed 5 months ago

jdries commented 6 months ago

To test via editor.openeo.cloud, so that aggragtor is used:

soxofaan commented 6 months ago

Test with large and silly large bounding boxes => should give validation error

to make more concrete:

import openeo
con = openeo.connect("openeo-dev.vito.be").authenticate_oidc()
pg = """
{
  "process_graph": {
    "loadco1": {
      "process_id": "load_collection",
      "arguments": {
        "id": "SENTINEL2_L2A",
        "spatial_extent": {"west": 6, "east": 17, "south": 43, "north": 55},
        "temporal_extent": ["2023-05-06", "2023-12-06"]
      }
    },
    "savere1": {
      "process_id": "save_result",
      "arguments": {"data": {"from_node": "loadco1"}, "format": "GTIFF"},
      "result": true
    }
  }
}
"""
print(con.validate_process_graph(pg))

This currently doesn't give validation issues, but something like ExtentTooLarge should be thrown.

I already tried to improve lack of logging in is_layer_too_large with d71f56a06c6564e3341ac40b921ab731fefa3bf9 to get more insights what's happening

soxofaan commented 6 months ago

added logging uncovered:

is_layer_too_large estimated_pixels=0.0 threshold_pixels=100000000000 (bbox_width=896745.7522699834 bbox_height=1358920.5804835008 cell_width=10.0 cell_height=10.0 days=214 nr_bands=0)

so if user does not provide bands, validation logic thinks no bands and no pixels.

already pushed quickfix to at least assume one band in f8e2dc7857dc6e33ab1e928366c2fb5e8a12cf47

jdries commented 6 months ago

Thanks for the hint, adding bands gave me another weird validation error: can't compare datetime.datetime to datetime.date

JeroenVerstraelen commented 6 months ago

Tested validation on editor.openeo.cloud with:

All graphs triggered the correct validation response.

Image

soxofaan commented 6 months ago

Maybe I'm misunderstanding, but what has been changed since my last quickfix (https://github.com/Open-EO/openeo-geopyspark-driver/commit/f8e2dc7857dc6e33ab1e928366c2fb5e8a12cf47) I can't find any validation related commits after that.

e.g. I still see this todo note on master: https://github.com/Open-EO/openeo-geopyspark-driver/blob/03ebc5954acf38fa15b66d8ef9e27c3b7f1ff3dc/openeogeotrellis/backend.py#L1548-L1549

So I don't think the can't compare datetime.datetime to datetime.date issue is resolved yet

JeroenVerstraelen commented 5 months ago

Bands should now be fixed and tested.

Valid process graph:

  "process_graph": {
    "loadcollection1": {
      "process_id": "load_collection",
      "arguments": {
        "bands": [
          "B02"
        ],
        "id": "SENTINEL2_L2A",
        "properties": {
          "eo:cloud_cover": {
            "process_graph": {
              "lte1": {
                "process_id": "lte",
                "arguments": {
                  "x": {
                    "from_parameter": "value"
                  },
                  "y": 85
                },
                "result": true
              }
            }
          }
        },
        "spatial_extent": {
          "west": 15.611572,
          "east": 20.093994,
          "north": 60.042903999999986,
          "south": 55.453481000000004
        },
        "temporal_extent": [
          "2023-08-01T00:00:00Z",
          "2023-08-15T00:00:00Z"
        ]
      }
    },
    "saveresult1": {
      "process_id": "save_result",
      "arguments": {
        "data": {
          "from_node": "loadcollection1"
        },
        "format": "netcdf",
        "options": {
          "output_format": "netcdf",
          "format": "netcdf"
        }
      },
      "result": true
    }
  }
}

When no bands are provided, it correctly assumes all bands will be used: Requested extent for collection 'SENTINEL2_L2A' is too large to process. Estimated number of pixels: 5.26e+11, threshold: 1.00e+11.

 {
  "process_graph": {
    "loadcollection1": {
      "process_id": "load_collection",
      "arguments": {
        "id": "SENTINEL2_L2A",
        "properties": {
          "eo:cloud_cover": {
            "process_graph": {
              "lte1": {
                "process_id": "lte",
                "arguments": {
                  "x": {
                    "from_parameter": "value"
                  },
                  "y": 85
                },
                "result": true
              }
            }
          }
        },
        "spatial_extent": {
          "west": 15.611572,
          "east": 20.093994,
          "north": 60.042903999999986,
          "south": 55.453481000000004
        },
        "temporal_extent": [
          "2023-08-01T00:00:00Z",
          "2023-08-15T00:00:00Z"
        ]
      }
    },
    "saveresult1": {
      "process_id": "save_result",
      "arguments": {
        "data": {
          "from_node": "loadcollection1"
        },
        "format": "netcdf",
        "options": {
          "output_format": "netcdf",
          "format": "netcdf"
        }
      },
      "result": true
    }
  }
}