Closed chuckwondo closed 1 year ago
Updated the instructions for this ticket.
Updated the instructions with a few more edits. Here are the verification Earthdata URLs
Earthdata URLs are:
First Granule URLs https://search.earthdata.nasa.gov/search/granules?p=C2112982481-CSDA&pg[0][v]=f&pg[0][qt]=2020-01-18%2C2020-01-19&pg[0][gsk]=-start_date&q=PSScene&ac=true&tl=1684171650!3!!
Last Granule URLs https://search.earthdata.nasa.gov/search/granules?p=C2112982481-CSDA&pg[0][v]=f&pg[0][qt]=2020-11-24%2C2020-12-31&pg[0][gsk]=-start_date&q=PSScene&ac=true&tl=1684171650!3!!
CMR Link Fix 'ingest' is in progress.
Update: Waiting until a concurrent usage of PROD is complete before kicking this one off.
Here is the copy of the rule file.
{
"name": "PSScene3Band___1_2020_finish_link_updates",
"state": "DISABLED",
"provider": "planet",
"collection": {
"name": "PSScene3Band",
"version": "1"
},
"workflow": "DiscoverAndQueueGranules",
"rule": {
"type": "onetime"
},
"meta": {
"providerPathFormat": "'storage-ss-ingest-prod-ingesteddata-uswest2/planet/PSScene3Band-'yyyyMMdd",
"ingestedPathFormat": "'planet/PSScene3Band/'yyyyMMdd",
"startDate": "2020-01-18T00:00:00Z",
"endDate": "2021-01-01T00:00:00Z",
"step": "P1D",
"rule": {
"state": "DISABLED"
}
}
}
Currently blocked by https://github.com/NASA-IMPACT/csdap-cumulus/issues/222
Ticket #222 is a fix where we change the prefix granule ids step over from a lambda function to an ECS task. That ticket is blocked by a bug where the ECS task outputs too many log lines during an unzip, which causes the whole process to fail. This issue has already been addressed but has not been pushed. Cumulus needs to push a new version of
ecs_task_image = "cumuluss/cumulus-ecs-task:1.8.0"
which has the fix prevously pushed through a pull request.
Fixed the bug blocking this task. Now the only blocker is finding the right time to use the Ingest machinery. Will be timing this to happen between other ingests.
Starting PSScene CMR Updates at this time.
All Done
During metadata link updates for rule PSScene3Band___1_2020, processing failed once it reached 1/18/2020 because that day had an oddly large number of discovered granules: over 50K in a single day! (the norm is typically under 5K)
It is highly probable that this worked previously because during the original ingestion of PSScene3Band, the DiscoverGranules and QueueGranules steps of the DiscoverAndQueueGranules workflow were run as ECS tasks, not as Lambda functions (as they are currently).
Therefore, the solution to this issue (which is almost certainly why these failures also occurred: #171, #172, and #174), is to resort to running those steps as ECS tasks once again.
Instructions
To finish the link updates:
[x] Create a new rule named PSScene3Band_1_2020_finish_linkupdates as a copy of PSScene3Band1_2020, but with the following startDate:
[x] Add the rule to Prod using the Cumulus CLI
[x] Before the Run, replace the
planet
provider with the modified one which sources the files from our protected bucket[x] Enable the rule
[x] Run the rule
[x] Confirm in Earthdata Search that at least 1 granule from 2020-01-18 and 1 from 2020-11-24 (last day of data) had their download links updated to include the friendly hostname
data.csda.earthdata.nasa.gov
. For reference, the old hostname is:dy8riyaot0kde.cloudfront.net
Earthdata URLs are:[x] After the Run, restore the
planet
provider which sources form the delivery bucket