NASA-IMPACT / csdap-cumulus

SmallSat Cumulus Deployment
Other
1 stars 1 forks source link

WV02_MSI_L1B from MCP to Prod Ingestion FAILED granules -- Query, Checksum and DB insertion #363

Closed jsrikish closed 2 months ago

jsrikish commented 3 months ago
jsrikish commented 3 months ago

Year 2010 WV02_M1BS: Athena query to find granules in S3 and missing in DB: SELECT split_part( split_part(m.key, '.', 1), --drop the extension '-BROWSE', 1 ) granule, COUNT() num_files_per_granule FROM maxar_transfer m LEFT OUTER JOIN nccs_wv02_assets n ON m.key = n.s3_file_path WHERE dt LIKE '2024-06-20%' AND m.key LIKE 'css/nga/WV02/1B/2010/%M1BS%' AND n.s3_file_path IS NULL GROUP BY 1 HAVING COUNT() >= 4

2010: List copied to NCCS csda201 to calculate checksum 45672 -- granules missing in DB. (45672*4 = 182688 objects) 170680 -- granules in DB

2011: List copied to NCCS csdax201 to calculate checksum 109121 -- granules missing in DB (109121*4 = 436484 objects) 325778-- granules in DB

jsrikish commented 3 months ago

CSDA201: 2014 WV02 M1BS: No. of granules not in DB: 144922 No. of granules deleted from disk: 71715 No. of objects to calculate CS: 292825

2016 WV02 M1BS: No. of granules not in DB: 103904 No. of granules deleted from disk: 59800 No. of objects to calculate CS: 176413

2018 WV02 M1BS: No. of granules not in DB: 85407 No. of granules deleted from disk: 36743 No. of objects to calculate CS: 194653

2020 WV02 M1BS: No. of granules not in DB: 93070 No. of granules deleted from disk: 24328 No. of objects to calculate CS: 274965

2022 WV02 M1BS: No. of granules not in DB: 416747 No. of granules deleted from disk: 97282 No. of objects to calculate CS: 1277857

jsrikish commented 2 months ago

Checksum calculation completed 2010-2022.
Database insertion is in progress

jsrikish commented 2 months ago

Database insertion is completed;

Checksums have been calculated for years 2009-2022 for objects on NCCS

list of objects in S3 but deleted from NCCS is copied over to s3://jaymcps5cmd/athena_query_results/jay_WV02MIBS_NotOnDisk/ 2009_missing_ondisk_butins3.csv 2010_missing_ondisk_butins3.csv 2011_missing_ondisk_butins3.csv 2012_missing_ondisk_butins3.csv 2013_missing_ondisk_butins3.csv 2014_missing_ondisk_butins3.csv 2015_missing_ondisk_butins3.csv 2016_missing_ondisk_butins3.csv 2017_missing_ondisk_butins3.csv 2018_missing_ondisk_butins3.csv 2019_missing_ondisk_butins3.csv 2020_missing_ondisk_butins3.csv 2021_missing_ondisk_butins3.csv 2022_missing_ondisk_butins3.csv

DAG has to read the csv,, compute checksum and write to a folder in s3 with (key,s3_file_size,local_file_size,checksum) for each year Copy the csv to NCCS to insert them into DynamoDB

jsrikish commented 1 month ago

2022_missing_ondisk_butins3.csv-- Total no of objects missing on disk 389129

mwaa DAG to restore objects was migrated to sm2a with Abdelhak's help--Brad explained the input parameters for the DAG After successful restoration, the compute checksum DAG was started with 30 celery workers and 900 threads for each Total time taken to compute checksum for 389129 objects: 11 hr 14 min Output of the DAG with the (key,s3_file_size,local_file_size).csv was copied to NCCS csv file was successfully inserted into the DB

Next, is to start the UMM-G creation DAG; some granules have previously been added into cumulus for which there is a cmr.json. UMM-G has to be created for the new granules added to DB
Total No. of granules added: (CS computed for granules on disk + CS computed for granules missing on disk) 388193+97282 = 485475