NASA-IMPACT / csdap-cumulus

SmallSat Cumulus Deployment
Other
1 stars 1 forks source link

NGAP Manifest Analysis: Utility to generate Lists: SAFE_TO_DELETE', and 'STILL_NEED_TO_INGEST' #319

Closed krisstanton closed 5 months ago

krisstanton commented 9 months ago

NGAP Manifest Lists Generation

This is a utility that runs AFTER the ORCA Validation Utility (this is part 2 of 3 of the Cumulus Pipeline Manifest utilities).

This is a second utility to be run with the ORCA Validation Utility.

The purpose of this is to generate two lists: SAFE_TO_DELETE, STILL_NEED_TO_INGEST for NGAP (old PROD) as we are completing the migration ingests)

This is done by examining the NGAP Ingested Bucket Manifest and comparing it with the outputs from the ORCA Validation Utility (https://github.com/NASA-IMPACT/csdap-cumulus/issues/270).

Note: The file names should be the same but the paths to them might be different (I believe the discrepancy is between MCP Maxar Delivery Bucket Paths and the NGAP/CBA Bucket Paths). We may need some additional logic in the utility that compares only parts of the file path (instead of the entire file path) from different manifests to generate these lists.

Files in the MCP: STILL_NEED_TO_INGEST list here will require a run through Airflow (restore and convert xml to cmr) and then Cumulus Files in the NGAP: STILL_NEED_TO_INGEST list here would only require a run through Airflow from the MCP bucket IF the CMR record is missing (there were a small number of these cases). The default case for these would be to just run the Cumulus ingest rule which covers these items.

krisstanton commented 5 months ago

This ticket ended up falling under the scope of this other ticket made for deleting OLD NGAP files. https://github.com/NASA-IMPACT/csdap-cumulus/issues/347

Closing this now.