Closed krisstanton closed 8 months ago
The count before migration ingest from Earthdata Search is 3,086,582 Granules with inclusive date range: 2009-01-01T00:00:00Z through 2023-01-01T00:00:00Z.
Ingest is running (Started at around 11:00 am central on Jan 18, 2024) State Machine: cumulus-prod-DiscoverAndQueueGranules Execution: 8eb32630-c25a-4d55-8aab-bb1f24846205 So far, 339 succeeded out of 5113 total. 0 errors.
Currently awaiting some of the successful granules to get ingested so they can be verified.
Things are chugging along very quickly and smoothly in the discovery/queue phase, which looks like it should finish in a total of under 6 hours!
The ingest/publish phase is also running very smoothly, but appears to be overly throttled, with a cap of 18K granules/hour, which is about half of what we achieved with WV03_MSI_L1B. In my effort to avoid hitting the AWS quota on concurrent Lambda executions, I made an adjustment that is causing this reduced rate because I missed changing a corresponding config at the same time.
Unfortunately, this means that ingesting/publishing ~3M granules will take ~7 days, when we'd like to see it take only ~3 days (i.e., we'd like to achieve a rate of >1M/day or >41.7K/hr, ideally more like ~60K/hr or ~1K/min).
I'm going to open a new issue to address this, which we can try with our next ingestion (in about a week, ugh!)
Ingestion completed, with the following results:
This is fantastic to see no granules left in either queued or running status.
Further, the failures are broken down as follows:
2905 Error
114 MissingCmrFile
2 NoSuchKey
6 TypeError
The generic "Error" failures all seem to be CMR errors:
{
"errorType": "Error",
"errorMessage": "Failed to ingest, statusCode: 401, statusMessage: Unauthorized, CMR error message: [\"You do not have permission to perform that action.\"]",
"trace": [
"Error: Failed to ingest, statusCode: 401, statusMessage: Unauthorized, CMR error message: [\"You do not have permission to perform that action.\"]",
" at CMR.ingestUMMGranule (/var/task/webpack:/src/CMR.ts:259:13)",
" at runMicrotasks (<anonymous>)",
" at processTicksAndRejections (node:internal/process/task_queues:96:5)",
" at publishUMMGJSON2CMR (/var/task/webpack:/src/cmr-utils.js:184:15)",
" at publish2CMR (/var/task/webpack:/src/cmr-utils.js:230:12)",
" at async Promise.all (index 0)",
" at postToCMR (/var/task/webpack:/index.js:131:19)",
" at Object.runCumulusTask (/var/task/webpack:/node_modules/@cumulus/cumulus-message-adapter-js/dist/cma.js:221:1)",
" at Runtime.handler (/var/task/webpack:/index.js:159:10)"
]
}
I ran the following command in an attempt to rectify as many failures as possible:
cumulus dead-letter-archive recover-cumulus-messages
The command output the following:
{
"id": "461e38a5-4a87-4bff-ad50-8305f4233397",
"description": "Dead-Letter Processor ECS Run",
"operationType": "Dead-Letter Processing",
"status": "RUNNING",
"taskArn": "arn:aws:ecs:us-west-2:410469285047:task/cumulus-prod-CumulusECSCluster/dda8064e1d144c64b32de883d2b7a2da",
"createdAt": 1706230922687,
"updatedAt": 1706230922687
}
The async operation succeeded, did not affect the failure count.
Therefore, I used the same approach to reingest the failures as described in https://github.com/NASA-IMPACT/csdap-cumulus/issues/321#issuecomment-1898760714
I will report final results once all reingestions complete.
The granule status counts are now as follows:
Therefore, of the original 3027 failures, 821 were completed, 2145 became stuck in queued, and 61 remain failed.
Summary of failed statuses:
$ 2>/dev/null cumulus granules list --all -? collectionId=WV02_MSI_L1B___1 -? status=failed > WV02_MSI_L1B-failed.json
$ <WV02_MSI_L1B-failed.json jq -r '.[].error.errors | fromjson | .[0].error' | sort | uniq -c
61 MissingCmrFile
Summary of queued statuses:
$ 2>/dev/null cumulus granules list --all -? collectionId=WV02_MSI_L1B___1 -? status=queued > WV02_MSI_L1B-queued.json
$ <WV02_MSI_L1B-queued.json jq -r '.[].error.errors | fromjson | .[0].error' | sort | uniq -c
2091 Error
53 MissingCmrFile
1 NoSuchKey
This gives us a total of 114 granules with missing CMR files. The 2091 generic "Error" errors all seem to be CMR permission errors, for some odd reason:
Error: Failed to ingest, statusCode: 401, statusMessage: Unauthorized, CMR error message: ["You do not have permission to perform that action."]
at CMR.ingestUMMGranule (/var/task/webpack:/src/CMR.ts:259:13)
at runMicrotasks (<anonymous>)
at processTicksAndRejections (node:internal/process/task_queues:96:5)
at publishUMMGJSON2CMR (/var/task/webpack:/src/cmr-utils.js:184:15)
at publish2CMR (/var/task/webpack:/src/cmr-utils.js:230:12)
at async Promise.all (index 0)
at postToCMR (/var/task/webpack:/index.js:131:19)
at Object.runCumulusTask (/var/task/webpack:/node_modules/@cumulus/cumulus-message-adapter-js/dist/cma.js:221:1)
at Runtime.handler (/var/task/webpack:/index.js:159:10)
Migrate granules in collection WV02_MSI_L1B to CBA Prod by discovering/ingesting from existing prod account.
main
:git checkout main && git pull
git checkout -b issue323/migrate-wv02-msi
app/stacks/cumulus/resources/rules/WV02_MSI_L1B/v1/WV02_MSI_L1B___1.json
:"WV02_MSI_L1B___1"
"cumulus"
"'WV02_MSI_L1B___1/'yyyy/DDD"
"2009-01-01T00:00:00Z"
"2023-01-01T00:00:00Z"
DOTENV=.env.cba.prod make bash
)cumulus collections add --data app/stacks/cumulus/resources/collections/WV02_MSI_L1B___1.json
cumulus rules add --data app/stacks/cumulus/resources/rules/WV02_MSI_L1B/v1/WV02_MSI_L1B___1.json
cumulus rules enable --name WV02_MSI_L1B___1
cumulus rules run --name WV02_MSI_L1B___1
Acceptance criteria
data.csdap.earthdata.nasa.gov
[note:csdap
, notcsda
])csdap
, notcsda
) -- Cognito auth should be triggeredWV02_MSI_L1B
have been ingest into CBA Prod, with the exception of perhaps a small percentage of errors.To determine how many granules have been processed, first enter the Docker container:
In the container, run the following:
(note: due to a Cumulus bug, sometimes the status does not get properly updated. Try running these to match the numbers)
You should see output similar to the following:
In particular, look at the value for
body
and within it, locate the value of"count"
. In the output above, the count should match the Earthdata Search granule count obtained in the very first step.