NASA-IMPACT / csdap-cumulus

SmallSat Cumulus Deployment
Other
1 stars 0 forks source link

PrefixGranuleIds lambda function runs out of memory #222

Closed chuckwondo closed 1 year ago

chuckwondo commented 1 year ago

For a very large list of discovered granules, the PrefixGranuleIds lambda function runs out of memory. Unfortunately, the error message doesn't indicate the problem:

{
  "errorType": "Runtime.ExitError",
  "errorMessage": "RequestId: 84c258ed-e668-4eff-bbba-6d9926f8fb40 Error: Runtime exited with error: signal: killed"
}

However, upon further inspection of the CloudWatch logs, we can see that the used memory is equal to the allocated memory, and we don't want to bump the lambda memory setting any higher, so we need to configure it to run as an ECS task instead.

To configure it as an ECS task requires the following changes to app/stacks/cumulus/main.tf:

  1. Duplicate the block for resource "aws_sfn_activity" "queue_granules" and for the duplicated block, change "queue_granules" to "prefix_granule_ids", and "QueueGranules" to "PrefixGranuleIds".
  2. Duplicate the block for module "queue_granules_service" and for the duplicated block, make the same name changes as in the previous step.
    Edit - adding more detail - : Change "queue_granules_service" to "prefix_granule_ids_service" Change "QueueGranules" to "PrefixGranuleIds"
  3. Within the block for module "discover_granules_workflow", change the value of prefix_granule_ids_task_arn from aws_lambda_function.prefix_granule_ids.arn to aws_sfn_activity.prefix_granule_ids.id
  4. Deploy to a sandbox account and run a smoke test. To confirm that the new ECS task is running and its associated CloudWatch logs are being written, run the following command within Docker, after kicking off the smoke test: aws logs tail --follow ${CUMULUS_PREFIX}-PrefixGranuleIdsEcsLogs

For reference, here's the relevant section in the Cumulus docs: https://nasa.github.io/cumulus/docs/data-cookbooks/run-tasks-in-lambda-or-docker/#step-function-activities-and-cumulus-ecs-task

krisstanton commented 1 year ago

Related Ticket: https://github.com/NASA-IMPACT/csdap-cumulus/issues/180

krisstanton commented 1 year ago

This ticket is blocked by a bug where the ECS task outputs too many log lines during an unzip, which causes the whole process to fail. This issue has already been addressed but has not been pushed. Cumulus needs to push a new version of

ecs_task_image              = "cumuluss/cumulus-ecs-task:1.8.0"

which has the fix prevously pushed through a pull request.

This ticket is also a blocker to #180

chuckwondo commented 1 year ago

This is the Cumulus bug report of the problem blocking us: https://bugs.earthdata.nasa.gov/projects/CUMULUS/issues/CUMULUS-3094

It was fixed back in April, but was never released. I have asked for it to be released so we can use it, and thus not have to spend unnecessary time attempting to work around the issue, which would be tedious, and potentially impossible to work around at this point.

chuckwondo commented 1 year ago

Wow! What a fast response to my request for a release: https://github.com/nasa/cumulus-ecs-task/releases/tag/v1.9.1

@krisstanton, we can now change this:

ecs_task_image              = "cumuluss/cumulus-ecs-task:1.8.0"

to this:

ecs_task_image              = "cumuluss/cumulus-ecs-task:1.9.1"
krisstanton commented 1 year ago

Very Nice! I'll unblock and try again shortly!

krisstanton commented 1 year ago

Passed UAT Smoke Test