Open-EO / openeo-geopyspark-driver

OpenEO driver for GeoPySpark (Geotrellis)
Apache License 2.0
26 stars 4 forks source link

load_per_product: bands mismatch between metadata and netCDF asset #893

Open bossie opened 5 hours ago

bossie commented 5 hours ago
{
  "process_graph": {
    "loadstac2": {
      "arguments": {
        "bands": [
          "precipitation-flux",
          "temperature-mean"
        ],
        "spatial_extent": {
          "crs": "EPSG:32631",
          "east": 398043.66962998867,
          "north": 5807663.283030516,
          "south": 5800318.038133569,
          "srs": "EPSG:32631",
          "west": 388117.96494662925
        },
        "temporal_extent": [
          "2021-12-01",
          "2022-11-30"
        ],
        "url": "https://s3.waw3-1.cloudferro.com/swift/v1/agera/stac/collection.json"
      },
      "process_id": "load_stac"
    },
    "save1": {
      "arguments": {
        "data": {
          "from_node": "loadstac2"
        },
        "format": "NETCDF"
      },
      "process_id": "save_result",
      "result": true
    }
  }
}

The resulting netCDF asset contains 4 bands (precipitation-flux, temperature-mean, unkown_band_2, unkown_band_3) instead of the expected 2 (precipitation-flux, temperature-mean).

bossie commented 4 hours ago

The (monthly, in this case 12 in total) STAC items in the collection carry proper eo:bands (precipitation-flux, temperature-mean) and their GeoTiff assets match this as well.

bossie commented 4 hours ago

The regression is due to a different default_reading_strategy in the configuration of CDSE: load_per_product as opposed to load_by_target_partition on Terrascope; it was recently introduced upon fixing #812 where this load_per_product was actually applied to load_stac.

It seems load_per_product has a bug; a workaround is to disable it with a load_stac feature flag:

{
  "process_graph": {
    "loadstac2": {
      "arguments": {
        "bands": [
          "precipitation-flux",
          "temperature-mean"
        ],
        "featureflags": {
          "load_per_product": false
        },
        "spatial_extent": {
          "crs": "EPSG:32631",
          "east": 398043.66962998867,
          "north": 5807663.283030516,
          "south": 5800318.038133569,
          "srs": "EPSG:32631",
          "west": 388117.96494662925
        },
        "temporal_extent": [
          "2021-12-01",
          "2022-11-30"
        ],
        "url": "https://s3.waw3-1.cloudferro.com/swift/v1/agera/stac/collection.json"
      },
      "process_id": "load_stac"
    },
    "save1": {
      "arguments": {
        "data": {
          "from_node": "loadstac2"
        },
        "format": "NETCDF"
      },
      "process_id": "save_result",
      "result": true
    }
  }
}

This will fix the bands in the netCDF asset; peculiar is the fact that the result is only 2x1 pixels (regardless of the reading strategy used) and placed in the wrong location (near the equator). It does seem to work for larger AOI e.g. [0, 50, 5, 55].

bossie commented 4 hours ago

and placed in the wrong location (near the equator)

This seems to be an issue with the netCDF output as writing to GeoTiff does not exhibit this problem. This probably means that the workaround above indeed fixes Kristof V.T.'s original process graph.

jdries commented 57 minutes ago

@bossie note that it might just be that netcdf has better capabilities to write unknown bands, while geotiff relies more on cube metadata. Feel free to assign this one to me if you haven't digged into load_per_product yet.