NASA-IMPACT / csdap-cumulus

SmallSat Cumulus Deployment
Other
1 stars 0 forks source link

QueueGranules times out with sufficiently large list of granules to queue #185

Closed chuckwondo closed 1 year ago

chuckwondo commented 1 year ago

When the DiscoverGranules function discovers a sufficiently large number of granules, the QueueGranules function times out while attempting to queue them all. This is due to a highly inefficient implementation of QueueGranules. This is a known problem, and at the time of this writing, there is a PR open in the Cumulus repo for addressing this.

However, we don't have to wait for that fix to be published, which would also require a Cumulus upgrade. Instead, we can simply convert the QueueGranules lambda function to an ECS task.

For reference, see Example: Replacing AWS Lambda with a Docker container run on ECS, but the following steps provide the specific steps for our project, so the docs at that link were the original instructions for the following specifics.

Since we previously had QueueGranules configured as an ECS task, we can add the following to app/stack/cumulus/main.tf as found from the git logs:

resource "aws_sfn_activity" "queue_granules" {
  name = "${var.prefix}-QueueGranules"
}

module "queue_granules_service" {
  source = "https://github.com/nasa/cumulus/releases/download/<%= cumulus_version %>/terraform-aws-cumulus-ecs-service.zip"

  prefix = var.prefix
  name   = "QueueGranules"

  cluster_arn   = module.cumulus.ecs_cluster_arn
  desired_count = 1
  image         = local.ecs_task_image

  cpu                = local.ecs_task_cpu
  memory_reservation = local.ecs_task_memory_reservation

  environment = {
    AWS_DEFAULT_REGION = data.aws_region.current.name
  }
  command = [
    "cumulus-ecs-task",
    "--activityArn",
    aws_sfn_activity.queue_granules.id,
    "--lambdaArn",
    module.cumulus.queue_granules_task.task_arn
  ]
  alarms = {
    MemoryUtilizationHigh = {
      comparison_operator = "GreaterThanThreshold"
      evaluation_periods  = 1
      metric_name         = "MemoryUtilization"
      statistic           = "SampleCount"
      threshold           = 75
    }
  }
}

Further, within the same file, change this line:

    queue_granules_task_arn : module.cumulus.queue_granules_task.task_arn,

to this:

    queue_granules_task_arn : aws_sfn_activity.queue_granules.id,

Acceptance Criteria

krisstanton commented 1 year ago

Verified that the log group exists cumulus-uat-QueueGranulesEcsLogs Also verified that the smoketest worked

krisstanton commented 1 year ago

This task and https://github.com/NASA-IMPACT/csdap-cumulus/issues/174 are successfully completed.