NASA-IMPACT / veda-data-airflow

Airflow implementation of ingest pipeline for VEDA STAC data
Other
10 stars 4 forks source link

Discovery endpoint should not fail when id_template not in request #194

Open anayeaye opened 4 months ago

anayeaye commented 4 months ago

What

When the discovery endpoint is used to add items to an existing collection it fails when id_template is not provided in the request.

Note

The id_template default value is set in the s3_discovery util.

How to reproduce

  1. POST a collection via the ingest-api/collections endpoint (or choose a test collection in the dev catalog)
collection.json ```json { "id": "omi-19-item-collection-deleteme", "type": "Collection", "links": [], "title": "DELETE ME 19 item collection OMI_trno2", "extent": { "spatial": { "bbox": [ [-180, -90, 180, 90] ] }, "temporal": { "interval": [ [null, null] ] } }, "license": "MIT", "description": "OMI_trno2 - 0.10 x 0.10 Annual as Cloud-Optimized GeoTIFFs (COGs)", "item_assets": { "cog_default": { "type": "image/tiff; application=geotiff; profile=cloud-optimized", "roles": [ "data", "layer" ], "title": "Default COG Layer", "description": "Cloud optimized default layer to display on map" } }, "stac_version": "1.0.0", "renders": { "dashboard": { "colormap_name": "reds", "rescale": [ [ 0, 3000000000000000.0 ] ], "assets": [ "cog_default" ], "title": "VEDA Dashboard Render Parameters" } }, "providers": [ { "name": "NASA VEDA", "url": "https://www.earthdata.nasa.gov/dashboard/", "roles": [ "host" ] } ], "item_assets": { "test_asset": { "title": "An item asset description for test", "type": "image/tiff; application=geotiff; profile=cloud-optimized", "roles": ["test"] }, "cog_default": { "type": "image/tiff; application=geotiff; profile=cloud-optimized", "roles": [ "data", "layer" ], "title": "Default COG Layer", "description": "Cloud optimized default layer to display on map" } }, "assets": { "thumbnail": { "title": "Thumbnail", "description": "Photo by [Mick Truyts](https://unsplash.com/photos/x6WQeNYJC1w) (Power plant shooting steam at the sky)", "href": "https://thumbnails.openveda.cloud/no2--dataset-cover.jpg", "type": "image/jpeg", "roles": ["thumbnail"] } } } ```
  1. Submit a discovery/ request without providing id_template in config. For the above collection
{
  "bucket": "veda-data-store-staging",
  "collection": "omi-19-item-collection-deleteme",
  "datetime_range": "year",
  "discovery": "s3",
  "filename_regex": "^(.*).tif$",
  "prefix": "OMI_trno2-COG/",
}

Error log

AIRFLOW_CTX_DAG_OWNER=airflow
AIRFLOW_CTX_DAG_ID=veda_discover
AIRFLOW_CTX_TASK_ID=subdag_discover.discover_from_s3
AIRFLOW_CTX_EXECUTION_DATE=2024-07-19T15:53:42+00:00
AIRFLOW_CTX_TRY_NUMBER=1
AIRFLOW_CTX_DAG_RUN_ID=d222b047-d453-4980-acad-f40d473320c6
[2024-07-19, 15:53:50 UTC] {{logging_mixin.py:137}} INFO - Getting S3 response iterator for bucket: veda-data-store-staging, prefix: OMI_trno2-COG/
[2024-07-19, 15:53:50 UTC] {{taskinstance.py:1768}} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 247, in execute
    condition = super().execute(context)
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 175, in execute
    return_value = self.execute_callable()
  File "/usr/local/airflow/.local/lib/python3.10/site-packages/airflow/operators/python.py", line 192, in execute_callable
    return self.python_callable(*self.op_args, **self.op_kwargs)
  File "/usr/local/airflow/dags/veda_data_pipeline/groups/discover_group.py", line 36, in discover_from_s3_task
    return s3_discovery_handler(
  File "/usr/local/airflow/dags/veda_data_pipeline/utils/s3_discovery.py", line 251, in s3_discovery_handler
    item["item_id"] = id_template.format(item["item_id"])
AttributeError: 'NoneType' object has no attribute 'format'
[2024-07-19, 15:53:50 UTC] {{taskinstance.py:1318}} INFO - Marking task as FAILED. dag_id=veda_discover, task_id=subdag_discover.discover_from_s3, execution_date=20240719T155342, start_date=20240719T155349, end_date=20240719T155350
[2024-07-19, 15:53:50 UTC] {{standard_task_runner.py:100}} ERROR - Failed to execute job 2754 for task subdag_discover.discover_from_s3 ('NoneType' object has no attribute 'format'; 24409)
[2024-07-19, 15:53:50 UTC] {{local_task_job.py:208}} INFO - Task exited with return code 1

AC

ciaransweet commented 3 months ago

@anayeaye Are you able to give me a quick TL;DR run through of this (or rather, what I need to setup to replicate it) when you're awake? 🤞