Open anayeaye opened 1 month ago
Blocked by ingest api role (not assumed properly? not clear if anything else is different for viewing lpdaac vs the accessibility check) but progress checked in #138 (collection and item meta data corrections complete, currently ingest api feels it cannot access the lpdaac assets)
Just in case someone wonders why we even have them - these collections are not featured in the VEDA Earthdata Dashboard, but in the EO Dashboard, e.g. https://eodashboard.org/story?id=nasa-thwaites.
EDIT: it is not the role, and it is usgs-landsat not lpdaac, we were blocked by the bucket owner required requester pays parameter. I confirmed that we can get the head object with requester pays configured so I am working on a PR to get the ingest API to use the requester pays configuration if provided in the environment.
aws s3api head-object --bucket usgs-landsat --key collection02/level-2/standard/oli-tirs/2023/001/113/LC08_L2SR_001113_20230125_20230208_02_T2/LC08_L2SR_001113_20230125_20230208_02_T2_SR_B4.TIF --request-payer requester
out>
{
...
"RequestCharged": "requester"
}
UPDATES
✔️ Modifications to ingest-api made it possible to test accessibility with requester pays config ✔️ Additional invalid medadata were surfaced after getting past the accessibility check, these were 'fixed' by removing the classification extension (many of the items declared the classification extension but did not conform to the spec)
The next validation blocker: Many of the items have hrefs to non-existent assets (different from the requester pays issue)
aws s3api head-object --bucket usgs-landsat --key collection02/level-2/standard/oli-tirs/2022/001/113/LC09_L2SR_001113_20221130_20221202_02_T2/LC09_L2SR_001113_20221130_20221202_02_T2_SR_B4.TIF --request-payer requester
An error occurred (404) when calling the HeadObject operation: Not Found
Here are the currently publishable counts in test and I think we should move forward at this point and not attempt to correct any further (which means some invalid items in staging will NOT be published to production):
landsat-c2l2-sr-antarctic-glaciers-pine-island src_item_count=46 target_item_count=43 OK=False
landsat-c2l2-sr-antarctic-glaciers-thwaites src_item_count=53 target_item_count=49 OK=False
landsat-c2l2-sr-lakes-aral-sea src_item_count=1434 target_item_count=1402 OK=False
landsat-c2l2-sr-lakes-lake-balaton src_item_count=186 target_item_count=174 OK=False
landsat-c2l2-sr-lakes-lake-biwa src_item_count=72 target_item_count=70 OK=False
landsat-c2l2-sr-lakes-tonle-sap src_item_count=330 target_item_count=324 OK=False
landsat-c2l2-sr-lakes-vanern src_item_count=134 target_item_count=131 OK=False
What
Seven demo lansdat spotlight collections were published to the staging catalog that need to be mirrored in the production account. Because these collections refer to externally hosted data (LPDAAC) and contain custom provider metadata we need to create (or re-use a one-off script to mirror this metadata in the production STAC catalog.
Suggested steps
For each collection
collections
'landsat-c2l2-sr-antarctic-glaciers-pine-island', 'landsat-c2l2-sr-antarctic-glaciers-thwaites', 'landsat-c2l2-sr-lakes-aral-sea', 'landsat-c2l2-sr-lakes-lake-balaton', 'landsat-c2l2-sr-lakes-lake-biwa', 'landsat-c2l2-sr-lakes-tonle-sap', 'landsat-c2l2-sr-lakes-vanern'
AC
ingestion-data/production/collections
with existing summaries preserved and with corrected/removed item_assetstransformation-scripts/