Open-EO / openeo-python-driver

Common parts of a Python driver implementation for OpenEO
Apache License 2.0
11 stars 6 forks source link

raster_to_vector + apply_dimension throws an error #303

Closed JeroenVerstraelen closed 3 months ago

JeroenVerstraelen commented 3 months ago
import openeo

connection = openeo.connect("openeo.vito.be").authenticate_oidc()
bbox = [5.0, 51.2, 5.1, 51.3]
temp_ext = ["2023-01-01", "2023-01-20"]

s2_bands = connection.load_collection(
    "SENTINEL2_L2A", spatial_extent=dict(zip(["west", "south", "east", "north"], bbox)), temporal_extent=temp_ext, bands=["SCL"]
)

scl_band = s2_bands.band("SCL")
s2_cloudmask = ( (scl_band == 1) ) * 1.0

s2_cloudmask_vector = s2_cloudmask.raster_to_vector()
# s2_cloudmask_vector.download("s2_cloudmask_vector_vito_prod.geojson", format="geojson")
apply_dim_result = s2_cloudmask_vector.apply_dimension(dimension="bands", process=lambda x: x+1)
apply_dim_result.download("apply_dim_result.geojson", format="geojson")

This throws the following error:

Authenticated using refresh token.
Preflight process graph validation raised: [ProcessParameterInvalid] The value passed for parameter 'data' in process 'apply_dimension' is invalid: Expected (<class 'openeo_driver.datacube.DriverDataCube'>, <class 'openeo_driver.datacube.DriverVectorCube'>) but got <class 'openeo_driver.save_result.AggregatePolygonResult'>.
Traceback (most recent call last):
  File "raster_to_vector_geojson_output.py", line 17, in <module>
    apply_dim_result.download("apply_dim_result.geojson", format="geojson")
  File "/home/jeroen/Projects/Utils/workspace_jeroen/venv/lib/python3.8/site-packages/openeo/rest/vectorcube.py", line 263, in download
    return self._connection.download(cube.flat_graph(), outputfile=outputfile, validate=validate)
  File "/home/jeroen/Projects/Utils/workspace_jeroen/venv/lib/python3.8/site-packages/openeo/rest/connection.py", line 1647, in download
    response = self.post(
  File "/home/jeroen/Projects/Utils/workspace_jeroen/venv/lib/python3.8/site-packages/openeo/rest/connection.py", line 249, in post
    return self.request("post", path=path, json=json, allow_redirects=False, **kwargs)
  File "/home/jeroen/Projects/Utils/workspace_jeroen/venv/lib/python3.8/site-packages/openeo/rest/connection.py", line 788, in request
    return _request()
  File "/home/jeroen/Projects/Utils/workspace_jeroen/venv/lib/python3.8/site-packages/openeo/rest/connection.py", line 781, in _request
    return super(Connection, self).request(
  File "/home/jeroen/Projects/Utils/workspace_jeroen/venv/lib/python3.8/site-packages/openeo/rest/connection.py", line 187, in request
    self._raise_api_error(resp)
  File "/home/jeroen/Projects/Utils/workspace_jeroen/venv/lib/python3.8/site-packages/openeo/rest/connection.py", line 207, in _raise_api_error
    raise OpenEoApiError(
openeo.rest.OpenEoApiError: [400] ProcessParameterInvalid: The value passed for parameter 'data' in process 'apply_dimension' is invalid: Expected (<class 'openeo_driver.datacube.DriverDataCube'>, <class 'openeo_driver.datacube.DriverVectorCube'>) but got <class 'openeo_driver.save_result.AggregatePolygonResult'>. (ref: r-24080198fe9645568269f06e7e82e3e7)

So raster_to_vector returns a AggregatePolygonResult but it expects a DriverVectorCube. Once that is fixed we will have to double check to see if apply_dimension actually works on vector cubes.

jdries commented 3 months ago

Note that the error is in the dry run code path, I'll commit something for that, so we can get to the real issue.

jdries commented 3 months ago

Issue evolved after transitioning to vector cube, vectorcube metadata should not have x and y dimension, but rather 'geometry':

File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1716, in apply_process
    return process_function(args=ProcessArgs(args, process_id=process_id), env=env)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 773, in apply_dimension
    dimension = args.get_required(
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/processes.py", line 315, in get_required
    self._check_value(name=name, value=value, expected_type=expected_type, validator=validator)
  File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/processes.py", line 342, in _check_value
    raise ProcessParameterInvalidException(
openeo_driver.errors.ProcessParameterInvalidException: The value passed for parameter 'dimension' in process 'apply_dimension' is invalid: Must be one of ['x', 'y', 't'] but got 'geometry'.
jdries commented 3 months ago

After basic adjustments to types and metadata, request now runs without error. A unit test in geopyspark driver should be added to ensure that this continues to work.

    connection = openeo.connect("openeo-staging.dataspace.copernicus.eu").authenticate_oidc()
    bbox = [5.0, 51.2, 5.1, 51.3]
    temp_ext = ["2023-01-01", "2023-01-20"]

    s2_bands = connection.load_collection(
        "SENTINEL2_L2A", spatial_extent=dict(zip(["west", "south", "east", "north"], bbox)), temporal_extent=temp_ext,
        bands=["SCL"]
    )

    scl_band = s2_bands.band("SCL")
    s2_cloudmask = ((scl_band == 1)) * 1.0

    s2_cloudmask_vector = s2_cloudmask.raster_to_vector()

    udf = textwrap.dedent(
        """
        from openeo.udf import UdfData, FeatureCollection
        def process_vector_cube(udf_data: UdfData) -> UdfData:
            [feature_collection] = udf_data.get_feature_collection_list()
            gdf = feature_collection.data
            gdf["geometry"] = gdf["geometry"].buffer(distance=1, resolution=2)
            udf_data.set_feature_collection_list([
                FeatureCollection(id="_", data=gdf),
            ])
        """
    )
    udf_callback = openeo.UDF(code=udf, runtime="Python")
    apply_dim_result = s2_cloudmask_vector.apply_dimension(dimension="geometry", process=udf_callback)
    apply_dim_result.download("apply_dim_result.geojson", format="geojson")
EmileSonneveld commented 3 months ago

Added to integrationtests. It runs fine as sync job, but as batch job it gives the following error:

  File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 380, in run_job
    json.dump(metadata, f)
  File "/usr/lib64/python3.8/json/__init__.py", line 179, in dump
    for chunk in iterable:
  File "/usr/lib64/python3.8/json/encoder.py", line 431, in _iterencode
    yield from _iterencode_dict(o, _current_indent_level)
  File "/usr/lib64/python3.8/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib64/python3.8/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib64/python3.8/json/encoder.py", line 405, in _iterencode_dict
    yield from chunks
  File "/usr/lib64/python3.8/json/encoder.py", line 438, in _iterencode
    o = _default(o)
  File "/usr/lib64/python3.8/json/encoder.py", line 179, in default
    raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type PosixPath is not JSON serializable

Fetching s2_cloudmask_vector is enough to trigger the error:

job = s2_cloudmask_vector.create_job()
job.start_and_wait()
job.get_results().download_files()
jdries commented 3 months ago

After the last fix, it now also works in batch mode on staging.