Closed jdries closed 1 week ago
load_collection -> featureflags -> allow_empty_cube
@m-mohr do you think it makes sense to start a discussion about the concept of an empty datacube on the level of openeo-api or openeo-processes ?
To roughly sketch the use case we are trying to address here:
user that loads S1 ASCENDING and DESCENDING separately and after that wants to pick the cube with most observations (e.g. with an if
based on count
of temporal labels). Sometimes it can happen that either of ASC or DESC has no observations at all for the spatio-temporal extent, which at the moment raises something like an "NoDataAvailable" error, failing the whole job. Some kind of empty cube (with zero observations or temporal labels), could be a workaround.
Update from the user: on our CDSE backend, the usecase actually does work. It indeed generates an empty datacube, but the processes used by that case did not crash on it.
One aspect that the spec could help with: what should we generate when we write an empty datacube to a file? Only stac metadata and no asset, or instead error the job? (We'll now error in most cases.)
Generally openEO doesn't disallow empty data cubes, we just felt it's more user-friendly to report no data in load_collection directly.
Another use case (related to https://github.com/Open-EO/openeo-geopyspark-driver/issues/785):
This currently is not possible because because the NoDataAvailable
is thrown too eagerly from load_collection
@Pratichhya , can you provide a minimal case that gives this error?
Apearantly the graphs here are a good example already: https://github.com/Open-EO/openeo-geopyspark-driver/issues/785#issuecomment-2139048233
Job option would be best. Then we don't need to adhere to the specification.
This already works on CDSE. @bossie
i.e. the SENTINEL1_GRD collection.
Based on a discussion with Dennis, we are in favor of running it on CDSE.
Yet, as the customer was attracted through Terrascope and has Terrascope credits we will need to evaluate how to handle it.
@Pratichhya Could we test cropsar1D on CDSE for these particular fields?
As discussed, the allow_empty_cube
feature flag was already implemented so ideally this is just a matter of passing it as a parameter to the CropSAR
process: TBC.
@Pratichhya, @HansVRP, locally I've managed to run the process graph with the 6 polygons succesfully:
Does this look like a plausible result?
{
"Field_0_cropSAR": {
"2023-01-01": null,
"2023-01-02": null,
"2023-01-03": null,
"2023-01-04": null,
"2023-01-05": null,
"2023-01-06": null,
"2023-01-07": null,
"2023-01-08": null,
"2023-01-09": null,
"2023-01-10": null,
"2023-01-11": null,
"2023-01-12": null,
"2023-01-13": null,
"2023-01-14": null,
"2023-01-15": null,
"2023-01-16": null,
"2023-01-17": null,
"2023-01-18": null,
"2023-01-19": null,
"2023-01-20": null,
"2023-01-21": null,
"2023-01-22": null,
"2023-01-23": null,
"2023-01-24": null,
"2023-01-25": null,
"2023-01-26": null,
"2023-01-27": null,
"2023-01-28": null,
"2023-01-29": null,
"2023-01-30": null,
"2023-01-31": null,
"2023-02-01": null
},
"Field_1_cropSAR": {
"2023-01-01": null,
"2023-01-02": null,
"2023-01-03": null,
"2023-01-04": null,
"2023-01-05": null,
"2023-01-06": null,
"2023-01-07": null,
"2023-01-08": null,
"2023-01-09": null,
"2023-01-10": null,
"2023-01-11": null,
"2023-01-12": null,
"2023-01-13": null,
"2023-01-14": null,
"2023-01-15": null,
"2023-01-16": null,
"2023-01-17": null,
"2023-01-18": null,
"2023-01-19": null,
"2023-01-20": null,
"2023-01-21": null,
"2023-01-22": null,
"2023-01-23": null,
"2023-01-24": null,
"2023-01-25": null,
"2023-01-26": null,
"2023-01-27": null,
"2023-01-28": null,
"2023-01-29": null,
"2023-01-30": null,
"2023-01-31": null,
"2023-02-01": null
},
"Field_2_cropSAR": {
"2023-01-01": null,
"2023-01-02": null,
"2023-01-03": null,
"2023-01-04": null,
"2023-01-05": null,
"2023-01-06": null,
"2023-01-07": null,
"2023-01-08": null,
"2023-01-09": null,
"2023-01-10": null,
"2023-01-11": null,
"2023-01-12": null,
"2023-01-13": null,
"2023-01-14": null,
"2023-01-15": null,
"2023-01-16": null,
"2023-01-17": null,
"2023-01-18": null,
"2023-01-19": null,
"2023-01-20": null,
"2023-01-21": null,
"2023-01-22": null,
"2023-01-23": null,
"2023-01-24": null,
"2023-01-25": null,
"2023-01-26": null,
"2023-01-27": null,
"2023-01-28": null,
"2023-01-29": null,
"2023-01-30": null,
"2023-01-31": null,
"2023-02-01": null
},
"Field_3_cropSAR": {
"2023-01-01": null,
"2023-01-02": null,
"2023-01-03": null,
"2023-01-04": null,
"2023-01-05": null,
"2023-01-06": null,
"2023-01-07": null,
"2023-01-08": null,
"2023-01-09": null,
"2023-01-10": null,
"2023-01-11": null,
"2023-01-12": null,
"2023-01-13": null,
"2023-01-14": null,
"2023-01-15": null,
"2023-01-16": null,
"2023-01-17": null,
"2023-01-18": null,
"2023-01-19": null,
"2023-01-20": null,
"2023-01-21": null,
"2023-01-22": null,
"2023-01-23": null,
"2023-01-24": null,
"2023-01-25": null,
"2023-01-26": null,
"2023-01-27": null,
"2023-01-28": null,
"2023-01-29": null,
"2023-01-30": null,
"2023-01-31": null,
"2023-02-01": null
},
"Field_4_cropSAR": {
"2023-01-01": null,
"2023-01-02": null,
"2023-01-03": null,
"2023-01-04": null,
"2023-01-05": null,
"2023-01-06": null,
"2023-01-07": null,
"2023-01-08": null,
"2023-01-09": null,
"2023-01-10": null,
"2023-01-11": null,
"2023-01-12": null,
"2023-01-13": null,
"2023-01-14": null,
"2023-01-15": null,
"2023-01-16": null,
"2023-01-17": null,
"2023-01-18": null,
"2023-01-19": null,
"2023-01-20": null,
"2023-01-21": null,
"2023-01-22": null,
"2023-01-23": null,
"2023-01-24": null,
"2023-01-25": null,
"2023-01-26": null,
"2023-01-27": null,
"2023-01-28": null,
"2023-01-29": null,
"2023-01-30": null,
"2023-01-31": null,
"2023-02-01": null
},
"Field_5_cropSAR": {
"2023-01-01": 0.5519999861717224,
"2023-01-02": 0.5609999895095825,
"2023-01-03": 0.5680000185966492,
"2023-01-04": 0.574999988079071,
"2023-01-05": 0.5830000042915344,
"2023-01-06": 0.5899999737739563,
"2023-01-07": 0.5960000157356262,
"2023-01-08": 0.6029999852180481,
"2023-01-09": 0.609000027179718,
"2023-01-10": 0.6140000224113464,
"2023-01-11": 0.6190000176429749,
"2023-01-12": 0.6240000128746033,
"2023-01-13": 0.628000020980835,
"2023-01-14": 0.6330000162124634,
"2023-01-15": 0.6370000243186951,
"2023-01-16": 0.6399999856948853,
"2023-01-17": 0.6430000066757202,
"2023-01-18": 0.6460000276565552,
"2023-01-19": 0.6489999890327454,
"2023-01-20": 0.6510000228881836,
"2023-01-21": 0.652999997138977,
"2023-01-22": 0.656000018119812,
"2023-01-23": 0.6579999923706055,
"2023-01-24": 0.6600000262260437,
"2023-01-25": 0.6620000004768372,
"2023-01-26": 0.6639999747276306,
"2023-01-27": 0.6660000085830688,
"2023-01-28": 0.6669999957084656,
"2023-01-29": 0.6690000295639038,
"2023-01-30": 0.6710000038146973,
"2023-01-31": 0.671999990940094,
"2023-02-01": 0.6729999780654907
}
}
cropSAR might actually give an output eventhough it receives an empty input cube. But it also might need a dedicated feature flag.
I guess we should be able to tie this flag in with the flag for allowing empty datacubes?
Yes, that is what this PR is about: https://git.vito.be/projects/APPL/repos/nextland/pull-requests/5/overview
This is an example process graph with the feature enabled:
Does this approach (a parameter of the CropSAR
process) support your use case with regards to being able to turn the flag on and off?
Instead of a parameter, a job_option
is also an... option but requires a bit of extra work.
These is the result for the 60 polygon process graph: CropSAR_60_polygons.json
These is the result for the 60 polygon process graph: CropSAR_60_polygons.json
Are these for the previously failed polygons of j-24051737210441be92441beedd8ab5ce (60 polygons)?
Are these for the previously failed polygons of j-24051737210441be92441beedd8ab5ce (60 polygons)?
That is right.
Hi @bossie, Was the result achieved when tested in staging?
No, I ran this locally, it's not on dev or staging yet. I'm taking a different approach and the plan is to get this out shortly.
Would you consider the results I attached correct?
No, I ran this locally, it's not on dev or staging yet. I'm taking a different approach, and the plan is to get this out shortly.
ahh ok. Thank you for confirming
Would you consider the results I attached correct?
It looks similar to the expected result. However, for those with null, I was wondering if these didn't have any of Sentinel 1 and Sentinel 2. Also, I simply wanted to check if they still remain null when tested for the entire year.
Abandoned the PR in the nextland repo in favor of a job option:
{
"allow_empty_cubes": true
}
@Pratichhya this is available on staging.
We have checks like this one: https://github.com/Open-EO/openeo-geotrellis-extensions/blob/68143219e6882dffd0d05a46aaf29e530c0b93a5/openeo-geotrellis/src/main/scala/org/openeo/geotrellis/layers/FileLayerProvider.scala#L974 To avoid empty datacubes. There are however use cases where this is not desirable, because the workflow can deal with the absence of data.
Stack trace in sentinelhub: