Closed JeroenVerstraelen closed 3 months ago
Note that the error is in the dry run code path, I'll commit something for that, so we can get to the real issue.
Issue evolved after transitioning to vector cube, vectorcube metadata should not have x and y dimension, but rather 'geometry':
File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 1716, in apply_process
return process_function(args=ProcessArgs(args, process_id=process_id), env=env)
File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/ProcessGraphDeserializer.py", line 773, in apply_dimension
dimension = args.get_required(
File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/processes.py", line 315, in get_required
self._check_value(name=name, value=value, expected_type=expected_type, validator=validator)
File "/opt/openeo/lib/python3.8/site-packages/openeo_driver/processes.py", line 342, in _check_value
raise ProcessParameterInvalidException(
openeo_driver.errors.ProcessParameterInvalidException: The value passed for parameter 'dimension' in process 'apply_dimension' is invalid: Must be one of ['x', 'y', 't'] but got 'geometry'.
After basic adjustments to types and metadata, request now runs without error. A unit test in geopyspark driver should be added to ensure that this continues to work.
connection = openeo.connect("openeo-staging.dataspace.copernicus.eu").authenticate_oidc()
bbox = [5.0, 51.2, 5.1, 51.3]
temp_ext = ["2023-01-01", "2023-01-20"]
s2_bands = connection.load_collection(
"SENTINEL2_L2A", spatial_extent=dict(zip(["west", "south", "east", "north"], bbox)), temporal_extent=temp_ext,
bands=["SCL"]
)
scl_band = s2_bands.band("SCL")
s2_cloudmask = ((scl_band == 1)) * 1.0
s2_cloudmask_vector = s2_cloudmask.raster_to_vector()
udf = textwrap.dedent(
"""
from openeo.udf import UdfData, FeatureCollection
def process_vector_cube(udf_data: UdfData) -> UdfData:
[feature_collection] = udf_data.get_feature_collection_list()
gdf = feature_collection.data
gdf["geometry"] = gdf["geometry"].buffer(distance=1, resolution=2)
udf_data.set_feature_collection_list([
FeatureCollection(id="_", data=gdf),
])
"""
)
udf_callback = openeo.UDF(code=udf, runtime="Python")
apply_dim_result = s2_cloudmask_vector.apply_dimension(dimension="geometry", process=udf_callback)
apply_dim_result.download("apply_dim_result.geojson", format="geojson")
Added to integrationtests. It runs fine as sync job, but as batch job it gives the following error:
File "/opt/openeo/lib64/python3.8/site-packages/openeogeotrellis/deploy/batch_job.py", line 380, in run_job
json.dump(metadata, f)
File "/usr/lib64/python3.8/json/__init__.py", line 179, in dump
for chunk in iterable:
File "/usr/lib64/python3.8/json/encoder.py", line 431, in _iterencode
yield from _iterencode_dict(o, _current_indent_level)
File "/usr/lib64/python3.8/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib64/python3.8/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib64/python3.8/json/encoder.py", line 405, in _iterencode_dict
yield from chunks
File "/usr/lib64/python3.8/json/encoder.py", line 438, in _iterencode
o = _default(o)
File "/usr/lib64/python3.8/json/encoder.py", line 179, in default
raise TypeError(f'Object of type {o.__class__.__name__} '
TypeError: Object of type PosixPath is not JSON serializable
Fetching s2_cloudmask_vector
is enough to trigger the error:
job = s2_cloudmask_vector.create_job()
job.start_and_wait()
job.get_results().download_files()
After the last fix, it now also works in batch mode on staging.
This throws the following error:
So raster_to_vector returns a AggregatePolygonResult but it expects a DriverVectorCube. Once that is fixed we will have to double check to see if apply_dimension actually works on vector cubes.