Closed jsrikish closed 2 months ago
Year 2010 WV02_M1BS: Athena query to find granules in S3 and missing in DB: SELECT split_part( split_part(m.key, '.', 1), --drop the extension '-BROWSE', 1 ) granule, COUNT() num_files_per_granule FROM maxar_transfer m LEFT OUTER JOIN nccs_wv02_assets n ON m.key = n.s3_file_path WHERE dt LIKE '2024-06-20%' AND m.key LIKE 'css/nga/WV02/1B/2010/%M1BS%' AND n.s3_file_path IS NULL GROUP BY 1 HAVING COUNT() >= 4
2010: List copied to NCCS csda201 to calculate checksum 45672 -- granules missing in DB. (45672*4 = 182688 objects) 170680 -- granules in DB
2011: List copied to NCCS csdax201 to calculate checksum 109121 -- granules missing in DB (109121*4 = 436484 objects) 325778-- granules in DB
CSDA201: 2014 WV02 M1BS: No. of granules not in DB: 144922 No. of granules deleted from disk: 71715 No. of objects to calculate CS: 292825
2016 WV02 M1BS: No. of granules not in DB: 103904 No. of granules deleted from disk: 59800 No. of objects to calculate CS: 176413
2018 WV02 M1BS: No. of granules not in DB: 85407 No. of granules deleted from disk: 36743 No. of objects to calculate CS: 194653
2020 WV02 M1BS: No. of granules not in DB: 93070 No. of granules deleted from disk: 24328 No. of objects to calculate CS: 274965
2022 WV02 M1BS: No. of granules not in DB: 416747 No. of granules deleted from disk: 97282 No. of objects to calculate CS: 1277857
Checksum calculation completed 2010-2022.
Database insertion is in progress
Database insertion is completed;
Checksums have been calculated for years 2009-2022 for objects on NCCS
list of objects in S3 but deleted from NCCS is copied over to s3://jaymcps5cmd/athena_query_results/jay_WV02MIBS_NotOnDisk/ 2009_missing_ondisk_butins3.csv 2010_missing_ondisk_butins3.csv 2011_missing_ondisk_butins3.csv 2012_missing_ondisk_butins3.csv 2013_missing_ondisk_butins3.csv 2014_missing_ondisk_butins3.csv 2015_missing_ondisk_butins3.csv 2016_missing_ondisk_butins3.csv 2017_missing_ondisk_butins3.csv 2018_missing_ondisk_butins3.csv 2019_missing_ondisk_butins3.csv 2020_missing_ondisk_butins3.csv 2021_missing_ondisk_butins3.csv 2022_missing_ondisk_butins3.csv
DAG has to read the csv,, compute checksum and write to a folder in s3 with (key,s3_file_size,local_file_size,checksum) for each year Copy the csv to NCCS to insert them into DynamoDB
2022_missing_ondisk_butins3.csv-- Total no of objects missing on disk 389129
mwaa DAG to restore objects was migrated to sm2a with Abdelhak's help--Brad explained the input parameters for the DAG After successful restoration, the compute checksum DAG was started with 30 celery workers and 900 threads for each Total time taken to compute checksum for 389129 objects: 11 hr 14 min Output of the DAG with the (key,s3_file_size,local_file_size).csv was copied to NCCS csv file was successfully inserted into the DB
Next, is to start the UMM-G creation DAG; some granules have previously been added into cumulus for which there is a cmr.json. UMM-G has to be created for the new granules added to DB
Total No. of granules added: (CS computed for granules on disk + CS computed for granules missing on disk)
388193+97282 = 485475