Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 3 forks source link

bands in results out of order for PROBA-V collection(s) #211

Closed bossie closed 10 months ago

bossie commented 10 months ago

Noticed this while going over FileLayerProvider code in the context of https://github.com/Open-EO/openeo-geopyspark-driver/issues/465.

Even though the order of the bands differs, OpenEO will return the same result for both requests.

Example code:

connection = openeo.connect("openeo.vito.be").authenticate_oidc()

def statistics(bands) -> dict:
    data_cube = (connection
                 .load_collection("PROBAV_L3_S5_TOC_100M",
                                  temporal_extent=["2020-01-01", "2020-01-02"],
                                  bands=bands)
                 .aggregate_spatial(geometries={"type": "Polygon", "coordinates": [[[2.59003, 51.069], [2.59003, 51.08], [2.602, 51.08], [2.602, 51.069], [2.59003, 51.069]]]},
                                    reducer="mean")
                 .save_result("JSON"))

    return data_cube.execute()

# same bands but in different order
statistics_a = statistics(["SWIRVAA", "NDVI", "SWIRVZA"])
statistics_b = statistics(["SWIRVAA", "SWIRVZA", "NDVI"])

# should give a different result
assert statistics_a != statistics_b  # boom!

FAILED tests/debug_local_spark.py:4707 (Test.test_probav_bands_order_statistics) {'2020-01-01T00:00:00Z': [[78.80698529411765, 26.297794117647054, 29.112132352941178]]} != {'2020-01-01T00:00:00Z': [[78.80698529411765, 26.297794117647054, 29.112132352941178]]}

The band logic in FileLayerProvider is, IMHO, hard to follow and makes it difficult to adapt it so this might be a good opportunity to revisit it and clean it up.

bossie commented 10 months ago

This test reproduces the problem so enable it: https://github.com/Open-EO/openeo-geotrellis-extensions/blob/b34be2e0ea5c197f98e889d6d588930d563256d4/openeo-geotrellis/src/test/scala/org/openeo/geotrellis/file/ProbaVPyramidFactoryTest.scala#L137-L170

bossie commented 10 months ago

The root cause is a groupBy in the implementation, losing all information about the order of the requested bands.

My guess is that the problem only manifests itself when the bands in the GEOMETRY assets are involved; maybe users don't request these very often.