[ECS] [bug]: daemon task should be last to be stopped when instance is draining

stlava commented 2 years ago

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

Tell us about your request It would seem that there might be a regression because according to https://github.com/aws/containers-roadmap/issues/128#issuecomment-720262219 the daemon tasks should be last to be drained.

I currently have a daemon task which other services depend on. From the guidance of AWS support I've included a placement constrain on the service that depends on the daemon (memberOf(task:group == service:my-daemon)) because on occasion the scheduler would place the dependent task before placing the daemon and then not have room on the host to run the daemon.

The issue I'm running into now is that when the instance is draining the daemon is sometimes stopped before the dependent task when the instance is in draining. The placement constraint that ensures the daemon is started before the other tasks isn't respected during draining. I think there might be a regression here because an earlier comment described that this was previously addressed.

Which service(s) is this request for? ECS

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard? The daemon task proxies connections for the dependent task so if the daemon goes away then the dependent task isn't able to talk to the data store. From the application's perspective it can't track that this is not an intermittent failure to talk with the daemon.

Are you currently working around this issue? No work around. Though I've considered increasing the internal shutdown time within the daemon to be a little longer but even then it won't be long enough to encompass the other containers being signaled and stopped.

nathanpeck commented 2 years ago

Hi Slava. Are you using service launched tasks or RunTask launched tasks?

Unfortunately the linked comment only applies to service launched tasks. All service launched tasks will be stopped first, then daemon launched tasks.

However, there is no ordering guarantee for standalone RunTask API launched tasks. This may explain the behavior you are seeing.

stlava commented 2 years ago

The dependent tasks described are service launched tasks. I have a parallel AWS support case open where I referenced this issue and provided some logs from one of our ECS instances. I heard back this morning that the issue is being investigated as a bug by the service team.

abicky commented 1 year ago

I'm facing a similar problem. I can reproduce it as follows:

Execute terraform apply using the configuration file below
Drain the newly created ECS instance

provider "aws" {
  region = "ap-northeast-1"
}

locals {
  namespace = "moby-issue-44511"
}

data "aws_vpc" "default" {
  default = true
}

#--------------#
# ECS instance #
#--------------#

resource "aws_ecs_cluster" "main" {
  name = local.namespace
}

resource "aws_iam_instance_profile" "AmazonEC2ContainerServiceforEC2Role" {
  name = "AmazonEC2ContainerServiceforEC2Role"
  role = aws_iam_role.AmazonEC2ContainerServiceforEC2Role.name
}

resource "aws_iam_role" "AmazonEC2ContainerServiceforEC2Role" {
  name = "AmazonEC2ContainerServiceforEC2Role"

  assume_role_policy = <<-EOF
  {
    "Version":"2012-10-17",
    "Statement":[
      {
        "Effect":"Allow",
        "Principal": {
          "Service": [
            "ec2.amazonaws.com"
          ]
        },
        "Action":"sts:AssumeRole"
      }
    ]
  }
  EOF
}

resource "aws_iam_role_policy_attachment" "AmazonEC2ContainerServiceforEC2Role" {
  role       = aws_iam_role.AmazonEC2ContainerServiceforEC2Role.id
  policy_arn = "arn:aws:iam::aws:policy/service-role/AmazonEC2ContainerServiceforEC2Role"
}

resource "aws_launch_template" "main" {
  name = local.namespace

  image_id = "ami-0fcfb0266308303f6"

  instance_type = "t3.micro"

  iam_instance_profile {
    name = aws_iam_instance_profile.AmazonEC2ContainerServiceforEC2Role.name
  }

  network_interfaces {
    associate_public_ip_address = true
  }

  user_data = base64encode(<<-DATA
  #!/bin/bash

  cat <<'EOF' >> /etc/ecs/ecs.config
  ECS_CLUSTER=${aws_ecs_cluster.main.name}
  ECS_AVAILABLE_LOGGING_DRIVERS=["json-file","syslog","awslogs","fluentd"]
  EOF

  echo '{"debug": true}' >/etc/docker/daemon.json
  systemctl restart docker --no-block
  DATA
  )
}

resource "aws_instance" "main" {
  launch_template {
    id      = aws_launch_template.main.id
    version = "$Latest"
  }

  # Set iam_instance_profile explicitly to avoid the bug that will be fixed
  # by https://github.com/hashicorp/terraform-provider-aws/pull/27972.
  iam_instance_profile = aws_iam_instance_profile.AmazonEC2ContainerServiceforEC2Role.name

  tags = {
    Name = local.namespace
  }

  lifecycle {
    ignore_changes = [
      # Although the user data of the launch template is base64-encoded,
      # terraform-provider-aws cannot know that.
      user_data,
      # The latest version at the time the instance is created, not "$Latest" is set
      launch_template["version"],
    ]
  }
}

#---------#
# fluentd #
#---------#

resource "aws_ecs_task_definition" "fluentd" {
  family                = "${local.namespace}-fluentd"
  container_definitions = <<-JSON
  [
    {
      "name": "fluentd",
      "image": "fluent/fluentd:v1.15.3-1.0",
      "essential": true,
      "memoryReservation": 50,
      "portMappings": [
        { "containerPort": 24224, "hostPort": 24224 }
      ]
    }
  ]
  JSON
}

resource "aws_ecs_service" "fluentd" {
  name                = aws_ecs_task_definition.fluentd.family
  cluster             = aws_ecs_cluster.main.name
  task_definition     = aws_ecs_task_definition.fluentd.arn
  scheduling_strategy = "DAEMON"
}

#-------------#
# application #
#-------------#

resource "aws_ecs_task_definition" "hello" {
  family                = "${local.namespace}-application"
  container_definitions = <<-JSON
  [
    {
      "name": "hello",
      "image": "ubuntu",
      "essential": true,
      "command": ["sh", "-c", "while :; do echo \"$(date +%FT%T): hello\"; sleep 1; done"],
      "stopTimeout": 30,
      "logConfiguration": {
        "logDriver": "fluentd",
        "options": {
          "fluentd-address": "${aws_instance.main.private_ip}"
        }
      },
      "memoryReservation": 50
    }
  ]
  JSON
}

resource "aws_ecs_service" "hello" {
  name            = aws_ecs_task_definition.hello.family
  cluster         = aws_ecs_cluster.main.name
  task_definition = aws_ecs_task_definition.hello.arn
  desired_count   = 1

  deployment_minimum_healthy_percent = 0
}

Here is the result of aws ecs describe-tasks:

% TZ=UTC aws ecs describe-tasks --cluster moby-issue-44511 \
  --tasks ecb4495fd45343e88e5649a9d2f4eac6 f64e3d5e2f654fbeaac060acf07f7871 \
  | jq '[.tasks[] | { group, containerId: .containers[0].runtimeId, desiredStatus, lastStatus, createdAt, stoppingAt, stoppedAt, stoppedReason, containerStoppedReason: .containers[0].reason }]'
[
  {
    "group": "service:moby-issue-44511-application",
    "containerId": "d3030f9448430fc0be31d74cb4651b0f723043b6645ebc53ae0f84345adfaa15",
    "desiredStatus": "STOPPED",
    "lastStatus": "STOPPED",
    "createdAt": "2022-11-23T11:01:43.145000+00:00",
    "stoppingAt": "2022-11-23T11:02:51.646000+00:00",
    "stoppedAt": "2022-11-23T11:15:28.866000+00:00",
    "stoppedReason": "Service moby-issue-44511-application: container instance is in DRAINING state",
    "containerStoppedReason": "DockerTimeoutError: Could not transition to stopped; timed out after waiting 2m30s"
  },
  {
    "group": "service:moby-issue-44511-fluentd",
    "containerId": "e6cec48166c665fe4bffb9dceac44595953afc7fcd5bb727588ba62ff451bbbb",
    "desiredStatus": "STOPPED",
    "lastStatus": "STOPPED",
    "createdAt": "2022-11-23T11:00:49.831000+00:00",
    "stoppingAt": "2022-11-23T11:02:58.827000+00:00",
    "stoppedAt": "2022-11-23T11:03:01.819000+00:00",
    "stoppedReason": "Servicemoby-issue-44511-fluentd: container instance is in DRAINING state",
    "containerStoppedReason": null
  }
]

As you can see, the task of the daemon service "moby-issue-44511-fluentd" stopped before the task of the service "moby-issue-44511-application."

Also, here is the docker log:

$ journalctl -u docker -o json | jq -r .MESSAGE
-- snip --
time="2022-11-23T11:02:52.029418846Z" level=debug msg="Calling POST /v1.21/containers/d3030f9448430fc0be31d74cb4651b0f723043b6645ebc53ae0f84345adfaa15/stop?t=30"
time="2022-11-23T11:02:52.029494540Z" level=debug msg="Sending kill signal 15 to container d3030f9448430fc0be31d74cb4651b0f723043b6645ebc53ae0f84345adfaa15"
time="2022-11-23T11:02:59.372092034Z" level=debug msg="Running health check for container be5a2c16b818ecd5893d810710cf94c6cc24e15b14ac8966f9d8eb401d12d1fd ..."
time="2022-11-23T11:02:59.372551803Z" level=debug msg="starting exec command 888c6ff102d55ba1b8ce00464569eeb670d418c9d14a27fad966fb381ed78de1 in container be5a2c16b818ecd5893d810710cf94c6cc24e15b14ac8966f9d8eb401d12d1fd"
time="2022-11-23T11:02:59.374900092Z" level=debug msg="attach: stdout: begin"
time="2022-11-23T11:02:59.374939402Z" level=debug msg="attach: stderr: begin"
time="2022-11-23T11:02:59.378410573Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exec-added
time="2022-11-23T11:02:59.729173160Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exec-started
time="2022-11-23T11:02:59.758665471Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exit
time="2022-11-23T11:02:59.758851692Z" level=debug msg="attach: stdout: end"
time="2022-11-23T11:02:59.758874503Z" level=debug msg="attach: stderr: end"
time="2022-11-23T11:02:59.758902425Z" level=debug msg="attach done"
time="2022-11-23T11:02:59.758927682Z" level=debug msg="Health check for container be5a2c16b818ecd5893d810710cf94c6cc24e15b14ac8966f9d8eb401d12d1fd done (exitCode=0)"
time="2022-11-23T11:03:00.189947282Z" level=debug msg="Calling POST /v1.21/containers/e6cec48166c665fe4bffb9dceac44595953afc7fcd5bb727588ba62ff451bbbb/stop?t=30"
time="2022-11-23T11:03:00.190024918Z" level=debug msg="Sending kill signal 15 to container e6cec48166c665fe4bffb9dceac44595953afc7fcd5bb727588ba62ff451bbbb"
time="2022-11-23T11:03:01.541963842Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exit
time="2022-11-23T11:03:01.562363324Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/delete
time="2022-11-23T11:03:01.562396747Z" level=info msg="ignoring event" container=e6cec48166c665fe4bffb9dceac44595953afc7fcd5bb727588ba62ff451bbbb module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
time="2022-11-23T11:03:01.574352549Z" level=debug msg="Revoking external connectivity on endpoint ecs-moby-issue-44511-fluentd-5-fluentd-8490d7a8bce5fac7dd01 (d03613098372f77002e9c8f4c972358e227f03fa4e0875288e2a7fd2c6344e5a)"
time="2022-11-23T11:03:01.574995334Z" level=debug msg="/usr/sbin/iptables, [--wait -t nat -C DOCKER -p tcp -d 0/0 --dport 24224 -j DNAT --to-destination 172.17.0.2:24224 ! -i docker0]"
time="2022-11-23T11:03:01.575847232Z" level=debug msg="Calling GET /v1.21/containers/e6cec48166c665fe4bffb9dceac44595953afc7fcd5bb727588ba62ff451bbbb/json"
time="2022-11-23T11:03:01.577553743Z" level=debug msg="/usr/sbin/iptables, [--wait -t nat -D DOCKER -p tcp -d 0/0 --dport 24224 -j DNAT --to-destination 172.17.0.2:24224 ! -i docker0]"
time="2022-11-23T11:03:01.581373824Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -C DOCKER ! -i docker0 -o docker0 -p tcp -d 172.17.0.2 --dport 24224 -j ACCEPT]"
time="2022-11-23T11:03:01.588964441Z" level=debug msg="/usr/sbin/iptables, [--wait -t filter -D DOCKER ! -i docker0 -o docker0 -p tcp -d 172.17.0.2 --dport 24224 -j ACCEPT]"
time="2022-11-23T11:03:01.598819481Z" level=debug msg="/usr/sbin/iptables, [--wait -t nat -C POSTROUTING -p tcp -s 172.17.0.2 -d 172.17.0.2 --dport 24224 -j MASQUERADE]"
time="2022-11-23T11:03:01.602651989Z" level=debug msg="/usr/sbin/iptables, [--wait -t nat -D POSTROUTING -p tcp -s 172.17.0.2 -d 172.17.0.2 --dport 24224 -j MASQUERADE]"
time="2022-11-23T11:03:01.606894842Z" level=debug msg="DeleteConntrackEntries purged ipv4:1, ipv6:0"
time="2022-11-23T11:03:01.689072479Z" level=debug msg="Releasing addresses for endpoint ecs-moby-issue-44511-fluentd-5-fluentd-8490d7a8bce5fac7dd01's interface on network bridge"
time="2022-11-23T11:03:01.689111084Z" level=debug msg="ReleaseAddress(LocalDefault/172.17.0.0/16, 172.17.0.2)"
time="2022-11-23T11:03:01.689137776Z" level=debug msg="Released address PoolID:LocalDefault/172.17.0.0/16, Address:172.17.0.2 Sequence:App: ipam/default/data, ID: LocalDefault/172.17.0.0/16, DBIndex: 0x0, Bits: 65536, Unselected: 65531, Sequence: (0xe8000000, 1)->(0x0, 2046)->(0x1, 1)->end Curr:5"
time="2022-11-23T11:03:01.796276657Z" level=debug msg="Calling GET /v1.21/containers/e6cec48166c665fe4bffb9dceac44595953afc7fcd5bb727588ba62ff451bbbb/json"
time="2022-11-23T11:03:01.797123648Z" level=debug msg="Calling GET /v1.21/containers/e6cec48166c665fe4bffb9dceac44595953afc7fcd5bb727588ba62ff451bbbb/json"
time="2022-11-23T11:03:22.052640413Z" level=info msg="Container failed to exit within 30s of signal 15 - using the force" container=d3030f9448430fc0be31d74cb4651b0f723043b6645ebc53ae0f84345adfaa15
time="2022-11-23T11:03:22.052683086Z" level=debug msg="Sending kill signal 9 to container d3030f9448430fc0be31d74cb4651b0f723043b6645ebc53ae0f84345adfaa15"
time="2022-11-23T11:03:22.125742131Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/exit
time="2022-11-23T11:03:22.160118590Z" level=debug msg=event module=libcontainerd namespace=moby topic=/tasks/delete
time="2022-11-23T11:03:22.160152542Z" level=info msg="ignoring event" container=d3030f9448430fc0be31d74cb4651b0f723043b6645ebc53ae0f84345adfaa15 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
-- snip --

The log tells us the following fact:

2022-11-23T11:02:52.029494540Z: docker sent SIGTERM (signal 15) to the container "hello"
2022-11-23T11:03:00.190024918Z: docker sent SIGTERM (signal 15) to the container "fluentd"
2022-11-23T11:03:22.052683086Z: docker sent SIGKILL (signal 9) to the container "hello", which means docker sent SIGTERM to the container "fluentd" before the task of the service "moby-issue-44511-application" stopped.

I guess ecs-agent considers tasks that belong to a service no longer belong to the service after the task state becomes "STOPPING." I wonder if the behavior is expected.

ericdahl commented 1 year ago

We're running into this same issue (reverse issue of #428)

When an ECS Container Instance is in DRAINING mode, we've observed that the DAEMON tasks are one of the first ones to shut down, while the various REPLICA tasks will stick around for longer as they are waiting to spin up replacement tasks, or because of ALB Deregistration Delay.

The result of this is that our key DAEMON tasks for Logging, Metrics, etc are shut down prior to the application tasks, so we lose visibility on the final minutes of the REPLICA tasks. This means those container level docker logs are lost (we use json-file with fluentd daemon) and we lose visibility. For example, we were recently trying to analyze the shutdown behavior of some REPLICAS during draining phase and could not because the logging DAEMON had shut down before the REPLICA.

AbhishekNautiyal commented 1 year ago

Happy to share that Amazon ECS rolled out an improvement in Daemon scheduling to ensure that Daemon tasks are the last to stop after all replica tasks shutdown on a DRAINING instance (as documented here. We are also actively working on delivering an ECS-managed instance draining solution as highlighted in https://github.com/aws/containers-roadmap/issues/256. Let us know if you face any other issues.

aws / containers-roadmap

[ECS] [bug]: daemon task should be last to be stopped when instance is draining #1700

Community Note