Open-EO / openeo-geotrellis-extensions

Java/Scala extensions for Geotrellis, for use with OpenEO GeoPySpark backend.
Apache License 2.0
5 stars 3 forks source link

chunk to prevent cancellation of large batch process #145

Open bossie opened 1 year ago

bossie commented 1 year ago

A batch job scheduled a single large batch process at SHub, which they cancelled after a while:

Dear Sentinel Hub user,

we would like to notify you that your batch processing request d6b8b472-3d9d-4d09-b15d-78853ae88c77 was cancelled because of failed processing due to size. The number of tiles is not an issue, but the number of input scenes and output responses is.

Could you please confirm that the intent was to download the data for 2 years? If yes, it should be divided into much smaller chunks, e.g., one month each. If you wish all data to still be delivered into a single folder, you can use a tile path such as:

"defaultTilePath": "s3://openeo-sentinelhub/my-folder//."

Do not hesitate to ask for further explanation if needed.

Best regards, Sentinel Hub Team.

Can we split it up into smaller chunks (time-wise)?

The job's process graph:

{
  "process_graph": {
    "loadcollection1": {
      "process_id": "load_collection",
      "arguments": {
        "bands": [
          "B02",
          "B03",
          "B04"
        ],
        "id": "SENTINEL2_L2A_SENTINELHUB",
        "spatial_extent": {
          "west": -49.83,
          "south": 59.68,
          "east": -41.55,
          "north": 61.83,
          "crs": "EPSG:4326"
        },
        "temporal_extent": [
          "2020-08-01",
          "2022-08-01"
        ]
      }
    },
    "reducedimension1": {
      "process_id": "reduce_dimension",
      "arguments": {
        "data": {
          "from_node": "loadcollection1"
        },
        "dimension": "t",
        "reducer": {
          "process_graph": {
            "max1": {
              "process_id": "max",
              "arguments": {
                "data": {
                  "from_parameter": "data"
                }
              },
              "result": true
            }
          }
        }
      }
    },
    "saveresult1": {
      "process_id": "save_result",
      "arguments": {
        "data": {
          "from_node": "reducedimension1"
        },
        "format": "GTiff",
        "options": {}
      },
      "result": true
    }
  }
}
bossie commented 1 year ago

The batch process in question: d6b8b472-3d9d-4d09-b15d-78853ae88c77.json.txt

bossie commented 1 year ago

From the follow-up e-mail:

thank you. We may start limiting certain aspects (most probably timeRange, number of outputs and their bands) in the near future; will keep you informed.