Open jdries opened 3 weeks ago
We use pystac to resolve all items, so this may be hard to parallelize within the backend. Maybe looking into a speedup of item retrieval itself is a better possibility.
I identified a method which is called once per item, but with same arguments (job and user id). The method is relatively expensive, so added caching, hoping to drastically improve performance.
At which point is there a call to get_result_assets
in this case?
Here: https://github.com/Open-EO/openeo-python-driver/blob/d5725229080989982436ce3986efb9b732e35792/openeo_driver/views.py#L1287 ? I am however now wondering if caching it is safe, the job results call here: https://github.com/Open-EO/openeo-python-driver/blob/d5725229080989982436ce3986efb9b732e35792/openeo_driver/views.py#L963 will also call get_results_assets.
~~So while job is not yet finished, assets will be incomplete, but method call gets cached because user is polling for partial results. Then job finishes, incorrect result without assets is returned because of caching...~~
Correction: there's a return statement for unfinished jobs in list_job_results, so assets will not be requested.
Not quite the same as this is about a STAC API, but while debugging Darius' AGERA5 issue locally, I noticed that the time between these logs is gradually getting longer, even for items within the same page (FeatureCollection):
2024-06-12 14:55:23,948 DEBUG [Thread-4] file.FixedFeaturesOpenSearchClient (FixedFeaturesOpenSearchClient.scala:36) - added Feature(agera520210616,Extent(-180.05, -90.05, 179.95, 90.05),2021-06-16T00:00Z,[Lorg.openeo.opensearch.OpenSearchResponses$Link;@246520d8,None,None,Some(POLYGON ((179.95 -90.05, 179.95 90.05, -180.05 90.05, -180.05 -90.05, 179.95 -90.05))),None,GeneralProperties(None,None,None,None,None),None,0.0)
2024-06-12 14:55:26,279 DEBUG [Thread-4] file.FixedFeaturesOpenSearchClient (FixedFeaturesOpenSearchClient.scala:36) - added Feature(agera520210615,Extent(-180.05, -90.05, 179.95, 90.05),2021-06-15T00:00Z,[Lorg.openeo.opensearch.OpenSearchResponses$Link;@233fe018,None,None,Some(POLYGON ((179.95 -90.05, 179.95 90.05, -180.05 90.05, -180.05 -90.05, 179.95 -90.05))),None,GeneralProperties(None,None,None,None,None),None,0.0)
Might be relevant, spent almost 20 minutes gathering the STAC items before processing started.
job id: j-2406079ce2dc4a4e863f4e4881c3778f This job took 45+ minutes in the driver before even starting the actual processing. Batch job logging indicates that it happened in the 'load_stac' implementation itself. This could make sense because retrieving metadata for all the individual items might have been slow.
Process graph was very simple: