NASA-IMPACT / veda-data

2 stars 0 forks source link

Mirror demo landsat-c212 collections to mcp-test and mcp-prod #134

Open anayeaye opened 1 month ago

anayeaye commented 1 month ago

What

Seven demo lansdat spotlight collections were published to the staging catalog that need to be mirrored in the production account. Because these collections refer to externally hosted data (LPDAAC) and contain custom provider metadata we need to create (or re-use a one-off script to mirror this metadata in the production STAC catalog.

Note this is NOT a transfer or s3 discovery task, all we want to do is mirror the metadata from staging in the production catalog

Suggested steps

For each collection

  1. correct or remove the item_assets (there is no cog_default but there are many band-specific assets)
  2. Keep the summaries object (we can manually create it but there isn't an automated way to generate this for items we are not ingesting via airflow)
  3. publish collection and add to veda-data/production/collections
  4. use a stac client to select all items for this collection in the staging stac catalog and for each:
    • remove any self-referential links, i.e. self, parent, collection, root
    • publish the item using the ingestions/ endpoint

collections

'landsat-c2l2-sr-antarctic-glaciers-pine-island', 'landsat-c2l2-sr-antarctic-glaciers-thwaites', 'landsat-c2l2-sr-lakes-aral-sea', 'landsat-c2l2-sr-lakes-lake-balaton', 'landsat-c2l2-sr-lakes-lake-biwa', 'landsat-c2l2-sr-lakes-tonle-sap', 'landsat-c2l2-sr-lakes-vanern'

AC

anayeaye commented 1 month ago

Blocked by ingest api role (not assumed properly? not clear if anything else is different for viewing lpdaac vs the accessibility check) but progress checked in #138 (collection and item meta data corrections complete, currently ingest api feels it cannot access the lpdaac assets)

j08lue commented 4 weeks ago

Just in case someone wonders why we even have them - these collections are not featured in the VEDA Earthdata Dashboard, but in the EO Dashboard, e.g. https://eodashboard.org/story?id=nasa-thwaites.

anayeaye commented 4 weeks ago

EDIT: it is not the role, and it is usgs-landsat not lpdaac, we were blocked by the bucket owner required requester pays parameter. I confirmed that we can get the head object with requester pays configured so I am working on a PR to get the ingest API to use the requester pays configuration if provided in the environment.

aws s3api head-object --bucket usgs-landsat --key collection02/level-2/standard/oli-tirs/2023/001/113/LC08_L2SR_001113_20230125_20230208_02_T2/LC08_L2SR_001113_20230125_20230208_02_T2_SR_B4.TIF --request-payer requester

out>
{
   ...
    "RequestCharged": "requester"
}
anayeaye commented 4 weeks ago

UPDATES

✔️ Modifications to ingest-api made it possible to test accessibility with requester pays config ✔️ Additional invalid medadata were surfaced after getting past the accessibility check, these were 'fixed' by removing the classification extension (many of the items declared the classification extension but did not conform to the spec)

The next validation blocker: Many of the items have hrefs to non-existent assets (different from the requester pays issue)

aws s3api head-object --bucket usgs-landsat --key collection02/level-2/standard/oli-tirs/2022/001/113/LC09_L2SR_001113_20221130_20221202_02_T2/LC09_L2SR_001113_20221130_20221202_02_T2_SR_B4.TIF --request-payer requester 

An error occurred (404) when calling the HeadObject operation: Not Found

Here are the currently publishable counts in test and I think we should move forward at this point and not attempt to correct any further (which means some invalid items in staging will NOT be published to production):

landsat-c2l2-sr-antarctic-glaciers-pine-island src_item_count=46 target_item_count=43 OK=False
landsat-c2l2-sr-antarctic-glaciers-thwaites src_item_count=53 target_item_count=49 OK=False
landsat-c2l2-sr-lakes-aral-sea src_item_count=1434 target_item_count=1402 OK=False
landsat-c2l2-sr-lakes-lake-balaton src_item_count=186 target_item_count=174 OK=False
landsat-c2l2-sr-lakes-lake-biwa src_item_count=72 target_item_count=70 OK=False
landsat-c2l2-sr-lakes-tonle-sap src_item_count=330 target_item_count=324 OK=False
landsat-c2l2-sr-lakes-vanern src_item_count=134 target_item_count=131 OK=False
anayeaye commented 3 weeks ago

https://github.com/NASA-IMPACT/veda-data/pull/138