NASA-IMPACT / veda-data-airflow

Airflow implementation of ingest pipeline for VEDA STAC data
Other
7 stars 4 forks source link

Discovery endpoint should submit STAC items for all discovered S3 objects #192

Open anayeaye opened 1 month ago

anayeaye commented 1 month ago

What

The discovery/ endpoint discovers more objects than are published to STAC. Generally only 9 or 10 items make it to the stac catalog which seems like maybe a batch is dropped when the discovery dag transitions from raster_vector_branching to parralel_run_process_rasters. No jobs fail in airflow.

Note When the same regex is supplied via dataset/publish all 19 items are created.

How to reproduce

  1. POST a collection via the ingest-api/collections endpoint
collection.json ```json { "id": "omi-19-item-collection-deleteme", "type": "Collection", "links": [], "title": "DELETE ME 19 item collection OMI_trno2", "extent": { "spatial": { "bbox": [ [-180, -90, 180, 90] ] }, "temporal": { "interval": [ [null, null] ] } }, "license": "MIT", "description": "OMI_trno2 - 0.10 x 0.10 Annual as Cloud-Optimized GeoTIFFs (COGs)", "item_assets": { "cog_default": { "type": "image/tiff; application=geotiff; profile=cloud-optimized", "roles": [ "data", "layer" ], "title": "Default COG Layer", "description": "Cloud optimized default layer to display on map" } }, "stac_version": "1.0.0", "renders": { "dashboard": { "colormap_name": "reds", "rescale": [ [ 0, 3000000000000000.0 ] ], "assets": [ "cog_default" ], "title": "VEDA Dashboard Render Parameters" } }, "providers": [ { "name": "NASA VEDA", "url": "https://www.earthdata.nasa.gov/dashboard/", "roles": [ "host" ] } ], "item_assets": { "test_asset": { "title": "An item asset description for test", "type": "image/tiff; application=geotiff; profile=cloud-optimized", "roles": ["test"] }, "cog_default": { "type": "image/tiff; application=geotiff; profile=cloud-optimized", "roles": [ "data", "layer" ], "title": "Default COG Layer", "description": "Cloud optimized default layer to display on map" } }, "assets": { "thumbnail": { "title": "Thumbnail", "description": "Photo by [Mick Truyts](https://unsplash.com/photos/x6WQeNYJC1w) (Power plant shooting steam at the sky)", "href": "https://thumbnails.openveda.cloud/no2--dataset-cover.jpg", "type": "image/jpeg", "roles": ["thumbnail"] } } } ```
  1. Trigger a discovery via the workflows api/discovery endpoint
    discovery-config.json
{
    "collection": "omi-19-item-collection-deleteme",
    "bucket": "veda-data-store-staging",
    "datetime_range": "year",
    "discovery": "s3",
    "filename_regex": "^(.*).tif$",
    "prefix": "OMI_trno2-COG/"
}

  1. Check number of items published vs. the number of objects discovered in the discovery DAG log. The example above should create 19 items.

AC

anayeaye commented 3 weeks ago

Since opening this issue we have a new bug that requires adding "id_template": "{}" to the discovery config as a temporary work around to #194

anayeaye commented 3 weeks ago

Recent changes in dev may have already resolved this issue. I don't know what change to trace this but: