Open anayeaye opened 1 month ago
Since opening this issue we have a new bug that requires adding "id_template": "{}"
to the discovery config as a temporary work around to #194
Recent changes in dev may have already resolved this issue. I don't know what change to trace this but:
DEV I added the test collection that should have 19 items and triggered the discovery workflow via openveda.cloud/api/workflows/discovery
and successfully ingested all 19 items.
https://dev.openveda.cloud/api/stac/collections/omi-19-item-collection-deleteme/items?limit=20.
SIT I also tested the concurrency refactors airflow branch https://github.com/NASA-IMPACT/veda-data-airflow/pull/197 that is deployed to staging by ingesting the same collection as omi-19-item-collection-deleteme-sit
and executing the workflow from the sit discovery at https://e3xr9mvkra.execute-api.us-west-2.amazonaws.com/doc. This also produced a collection of 19 items https://dev.openveda.cloud/api/stac/collections/omi-19-item-collection-deleteme-sit/items?limit=20.
What
The
discovery/
endpoint discovers more objects than are published to STAC. Generally only 9 or 10 items make it to the stac catalog which seems like maybe a batch is dropped when the discovery dag transitions from raster_vector_branching to parralel_run_process_rasters. No jobs fail in airflow.Note When the same regex is supplied via
dataset/publish
all 19 items are created.How to reproduce
collection.json
```json { "id": "omi-19-item-collection-deleteme", "type": "Collection", "links": [], "title": "DELETE ME 19 item collection OMI_trno2", "extent": { "spatial": { "bbox": [ [-180, -90, 180, 90] ] }, "temporal": { "interval": [ [null, null] ] } }, "license": "MIT", "description": "OMI_trno2 - 0.10 x 0.10 Annual as Cloud-Optimized GeoTIFFs (COGs)", "item_assets": { "cog_default": { "type": "image/tiff; application=geotiff; profile=cloud-optimized", "roles": [ "data", "layer" ], "title": "Default COG Layer", "description": "Cloud optimized default layer to display on map" } }, "stac_version": "1.0.0", "renders": { "dashboard": { "colormap_name": "reds", "rescale": [ [ 0, 3000000000000000.0 ] ], "assets": [ "cog_default" ], "title": "VEDA Dashboard Render Parameters" } }, "providers": [ { "name": "NASA VEDA", "url": "https://www.earthdata.nasa.gov/dashboard/", "roles": [ "host" ] } ], "item_assets": { "test_asset": { "title": "An item asset description for test", "type": "image/tiff; application=geotiff; profile=cloud-optimized", "roles": ["test"] }, "cog_default": { "type": "image/tiff; application=geotiff; profile=cloud-optimized", "roles": [ "data", "layer" ], "title": "Default COG Layer", "description": "Cloud optimized default layer to display on map" } }, "assets": { "thumbnail": { "title": "Thumbnail", "description": "Photo by [Mick Truyts](https://unsplash.com/photos/x6WQeNYJC1w) (Power plant shooting steam at the sky)", "href": "https://thumbnails.openveda.cloud/no2--dataset-cover.jpg", "type": "image/jpeg", "roles": ["thumbnail"] } } } ```discovery-config.json
AC