Closed soxofaan closed 2 weeks ago
Did some initial digging and I think this is going on: there is a UDF https://github.com/Open-EO/openeo-gfmap/blob/main/src/openeo_gfmap/preprocessing/udf_rank.py used in apply_neighborhood that takes a t-bands-y-x array and returns a t-y-x array (eliminating the band dimension). Our code seems to expect an array with bands dimension.
@soxofaan @jdries Seems like an OpenEO issue, as if I'm adding the band dimension manually to the DataCube before running the UDF, I get the error:
openeo.metadata.DimensionAlreadyExistsException: Dimension with name 'bands' already exists
Also before running the UDF, the following line is done, and the job doesn't complain before entering the UDF
score = score.rename_labels("bands", [BAPSCORE_HARMONIZED_NAME])
@GriffinBabe to provide some more context on the issue (e.g. full exception)
@soxofaan An example can be found on CDSE with job-id: j-240412ff075d4b6880c6f9945298349c
The full error message is:
OpenEO batch job failed: Exception during Spark execution: org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 830, in main process() File "/usr/local/spark/python/lib/pyspark.zip/pyspark/worker.py", line 822, in process serializer.dump_stream(out_iter, outfile) File "/usr/local/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 146, in dump_stream for obj in iterator: File "/usr/local/spark/python/lib/pyspark.zip/pyspark/util.py", line 81, in wrapper return f(*args, kwargs) File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/utils.py", line 56, in memory_logging_wrapper return function(*args, *kwargs) File "/opt/openeo/lib/python3.8/site-packages/epsel.py", line 44, in wrapper return _FUNCTION_POINTERS[key](args, kwargs) File "/opt/openeo/lib/python3.8/site-packages/epsel.py", line 37, in first_time return f(*args, *kwargs) File "/opt/openeo/lib/python3.8/site-packages/openeogeotrellis/geopysparkdatacube.py", line 524, in tile_function result_array = result_array.transpose(('t' ,'bands','y', 'x')) File "/opt/openeo/lib/python3.8/site-packages/xarray/core/dataarray.py", line 2154, in transpose dims = tuple(utils.infix_dims(dims, self.dims)) File "/opt/openeo/lib/python3.8/site-packages/xarray/core/utils.py", line 726, in infix_dims raise ValueError( ValueError: ('t', 'bands', 'y', 'x') must be a permuted list of ('t', 'y', 'x'), unless '...' is included
Basically my datacube is expected to have the "band" dimension when entering in the UDF, but it hasn't. It's a weird result, as I tried even to load more than one band as a first attempt to fix the issue (SCL and B04), but it didn't work
I tried then to add a dimension: https://github.com/Open-EO/openeo-gfmap/blob/main/src/openeo_gfmap/preprocessing/cloudmasking.py#L186
But I get the following error from the python client:
openeo.metadata.DimensionAlreadyExistsException: Dimension with name 'bands' already exists
It's already weird to have no band dimension at the entrance of the UDF, but here additionally there seems to be a problem between the backend and client representation of the datacube
@GriffinBabe The error that you show is on this line in the openEO backend code:
result_array = result_array.transpose(*('t' ,'bands','y', 'x'))
Where result_array refers to the output of your udf_rank.py
UDF.
Apply_neighborhood expects your output to have these dimensions ('t' ,'bands','y', 'x') but this line in the UDF creates a dataarray with ('t', 'y', 'x') dimensions:
bap_score = array.sel(bands="S2-L2A-BAPSCORE")
To solve it you can use this line instead:
bap_score = array.sel(bands=["S2-L2A-BAPSCORE"])
tests.test_openeo_gfmap.test_cloud_masking.test_bap_quintad[Backend.TERRASCOPE] currently fails with this in the error logs: