NASA-IMPACT / csdap-cumulus

SmallSat Cumulus Deployment
Other
1 stars 1 forks source link

Migrate collection WV01_Pan_L1B to CBA Prod #324

Closed krisstanton closed 8 months ago

krisstanton commented 9 months ago

Migrate granules in collection WV01_Pan_L1B to CBA Prod by discovering/ingesting from existing prod account.

Acceptance criteria

To determine how many granules have been processed, first enter the Docker container:

DOTENV=.env.cba-prod make bash

In the container, run the following:

DEBUG=1 cumulus granules list -? collectionId=WV01_Pan_L1B___1 --limit=0 -? status=completed

(note: due to a Cumulus bug, sometimes the status does not get properly updated. Try running these to match the numbers)

DEBUG=1 cumulus granules list -? collectionId=WV01_Pan_L1B___1 --limit=0
DEBUG=1 cumulus granules list -? collectionId=WV01_Pan_L1B___1 --limit=0 -? status=queued
DEBUG=1 cumulus granules list -? collectionId=WV01_Pan_L1B___1 --limit=0 -? status=running
DEBUG=1 cumulus granules list -? collectionId=WV01_Pan_L1B___1 --limit=0 -? status=completed
DEBUG=1 cumulus granules list -? collectionId=WV01_Pan_L1B___1 --limit=0 -? status=failed

You should see output similar to the following:

...
RESPONSE: {
  statusCode: 200,
  body: '{"meta":{"name":"cumulus-api","stack":"cumulus-prod","table":"granule","limit":0,"page":1,"count":8592},"results":[]}',
  headers: {
    'x-powered-by': 'Express',
    'access-control-allow-origin': '*',
    'strict-transport-security': 'max-age=31536000; includeSubDomains',
    'content-type': 'application/json; charset=utf-8',
    'content-length': '114',
    etag: 'W/"72-O2wUXhu+Q9J1hqdDrb0fcsZeFHo"',
    date: 'Fri, 01 Dec 2023 21:29:19 GMT',
    connection: 'close'
  },
  isBase64Encoded: false
}
[]

In particular, look at the value for body and within it, locate the value of "count". In the output above, the count should match the Earthdata Search granule count obtained in the very first step.

jsrikish commented 8 months ago

Total no. of granules in Earthdata search for WV01_Pan_L1B =. 5,039,681 granules

chuckwondo commented 8 months ago

@krisstanton and @jsrikish, ingestion finished with superb results.

Here's what I suggest you do, as a matter of practice to wrap up each ingestion, including this one (similar to what I've done for prior migration-ingestions):

  1. Use the Athena query to get the error counts by error type for the last execution. Most of them will likely be MissingCmrFile errors. Report the error counts in a new comment on this issue.
  2. Run the commands noted earlier in this issue for getting the status counts, and record the number of each status, count, along with the total count in a new comment on this issue.
  3. To try to clean up as many of the granules still in running or queued status, run the following command, which may take a fair bit of time to complete, depending upon how many dead letter archive messages there are:
    cumulus dead-letter-archive recover-cumulus-messages

    Even after that command finishes, there may be a number of messages in one of the queues (I can't recall which, but its name is very similar to the background job queue's name -- I think "background" may also be in its name), so make sure that queue is empty before performing the next step.

  4. Once the queue mentioned in the previous step is empty, run all of the status count commands again to get the updated counts, and add them to a new comment on this issue. Ideally, there will be 0 running and 0 queued, but that's likely not going to be the case.
krisstanton commented 8 months ago

Post Ingest Report

Discover and Queue State Machine Run: e4cdffdc-3b4b-4d8a-bf7a-4ea0afdfb2ac Succeeded Jan 30, 2024, 14:22:43 (UTC-06:00) Feb 2, 2024, 04:15:16 (UTC-06:00)

Athena Query - Checking for Errors and Error Counts

Query:  
    ID: d64dbe28-ccfc-4a1c-9fb2-69fff6b8ee6a
    Description: Failure counts by error type for most recent ingestion
// Output Table
#       error_type          count
1       MissingCmrFile      102
2       TypeError           56 
3       NoSuchKey           4

Attempting to clean up the granule status by running cumulus dead-letter-archive recover-cumulus-messages

    OUTPUT:
        (csdap-cumulus-prod-5047:prod):/work $ cumulus dead-letter-archive recover-cumulus-messages
        (node:493) NOTE: We are formalizing our plans to enter AWS SDK for JavaScript (v2) into maintenance mode in 2023.

        Please migrate your code to use AWS SDK for JavaScript (v3).
        For more information, check the migration guide at https://a.co/7PzMCcy
        (Use `node --trace-warnings ...` to show where the warning was created)
        {
          "id": "1cff3ce4-1138-4400-b75d-66e11952fc61",
          "description": "Dead-Letter Processor ECS Run",
          "operationType": "Dead-Letter Processing",
          "status": "RUNNING",
          "taskArn": "arn:aws:ecs:us-west-2:410469285047:task/cumulus-prod-CumulusECSCluster/85f6da25e3854c639ca075a38fe71b28",
          "createdAt": 1707162861149,
          "updatedAt": 1707162861149
        }
        (csdap-cumulus-prod-5047:prod):/work $ 

// Counts, Before and After running cumulus dead-letter-archive recover-cumulus-messages DEBUG=1 cumulus granules list -? collectionId=WV01_Pan_L1B___1 --limit=0 "count":5049803 "count":5049803

DEBUG=1 cumulus granules list -? collectionId=WV01_Pan_L1B___1 --limit=0 -? status=completed "count":5049197 "count":5049641

DEBUG=1 cumulus granules list -? collectionId=WV01_Pan_L1B___1 --limit=0 -? status=queued "count":1 "count":0

DEBUG=1 cumulus granules list -? collectionId=WV01_Pan_L1B___1 --limit=0 -? status=running "count":443 "count":0

DEBUG=1 cumulus granules list -? collectionId=WV01_Pan_L1B___1 --limit=0 -? status=failed "count":162 "count":162