NASA-IMPACT / csdap-cumulus

SmallSat Cumulus Deployment
Other
1 stars 1 forks source link

Finish metadata link updates in Prod for PSScene3Band 2020 #180

Closed chuckwondo closed 1 year ago

chuckwondo commented 1 year ago

During metadata link updates for rule PSScene3Band___1_2020, processing failed once it reached 1/18/2020 because that day had an oddly large number of discovered granules: over 50K in a single day! (the norm is typically under 5K)

It is highly probable that this worked previously because during the original ingestion of PSScene3Band, the DiscoverGranules and QueueGranules steps of the DiscoverAndQueueGranules workflow were run as ECS tasks, not as Lambda functions (as they are currently).

Therefore, the solution to this issue (which is almost certainly why these failures also occurred: #171, #172, and #174), is to resort to running those steps as ECS tasks once again.

Instructions

To finish the link updates:

krisstanton commented 1 year ago

Updated the instructions for this ticket.

krisstanton commented 1 year ago

Updated the instructions with a few more edits. Here are the verification Earthdata URLs

Earthdata URLs are:

First Granule URLs https://search.earthdata.nasa.gov/search/granules?p=C2112982481-CSDA&pg[0][v]=f&pg[0][qt]=2020-01-18%2C2020-01-19&pg[0][gsk]=-start_date&q=PSScene&ac=true&tl=1684171650!3!!

https://search.earthdata.nasa.gov/search/granules/granule-details?p=C2112982481-CSDA&pg[0][v]=f&pg[0][qt]=2020-01-18%2C2020-01-19&pg[0][gsk]=-start_date&g=G2338182751-CSDA&q=PSScene&ac=true&tl=1684171650!3!!

Last Granule URLs https://search.earthdata.nasa.gov/search/granules?p=C2112982481-CSDA&pg[0][v]=f&pg[0][qt]=2020-11-24%2C2020-12-31&pg[0][gsk]=-start_date&q=PSScene&ac=true&tl=1684171650!3!!

https://search.earthdata.nasa.gov/search/granules/granule-details?p=C2112982481-CSDA&pg[0][v]=f&pg[0][qt]=2020-11-24%2C2020-12-31&pg[0][gsk]=-start_date&g=G2338239061-CSDA&q=PSScene&ac=true&tl=1684171650!3!!

krisstanton commented 1 year ago

CMR Link Fix 'ingest' is in progress.

krisstanton commented 1 year ago

Update: Waiting until a concurrent usage of PROD is complete before kicking this one off.

krisstanton commented 1 year ago

Here is the copy of the rule file.

{
  "name": "PSScene3Band___1_2020_finish_link_updates",
  "state": "DISABLED",
  "provider": "planet",
  "collection": {
    "name": "PSScene3Band",
    "version": "1"
  },
  "workflow": "DiscoverAndQueueGranules",
  "rule": {
    "type": "onetime"
  },
  "meta": {
    "providerPathFormat": "'storage-ss-ingest-prod-ingesteddata-uswest2/planet/PSScene3Band-'yyyyMMdd",
    "ingestedPathFormat": "'planet/PSScene3Band/'yyyyMMdd",
    "startDate": "2020-01-18T00:00:00Z",
    "endDate": "2021-01-01T00:00:00Z",
    "step": "P1D",
    "rule": {
      "state": "DISABLED"
    }
  }
}
krisstanton commented 1 year ago

Currently blocked by https://github.com/NASA-IMPACT/csdap-cumulus/issues/222

Ticket #222 is a fix where we change the prefix granule ids step over from a lambda function to an ECS task. That ticket is blocked by a bug where the ECS task outputs too many log lines during an unzip, which causes the whole process to fail. This issue has already been addressed but has not been pushed. Cumulus needs to push a new version of

ecs_task_image              = "cumuluss/cumulus-ecs-task:1.8.0"

which has the fix prevously pushed through a pull request.

krisstanton commented 1 year ago

Fixed the bug blocking this task. Now the only blocker is finding the right time to use the Ingest machinery. Will be timing this to happen between other ingests.

krisstanton commented 1 year ago

Starting PSScene CMR Updates at this time.

krisstanton commented 1 year ago

All Done