aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.68k stars 3.93k forks source link

‼️ NOTICE: (ecs-patterns) NetworkLoadBalancedFargateService and NetworkLoadBalancedEc2Service cause target group replacement #24642

Closed otaviomacedo closed 1 year ago

otaviomacedo commented 1 year ago

Please add your +1 👍 to let us know you have encountered this

Status: WORKAROUND ONLY

Overview

Versions 1.141.0 and 2.9.0 (released Jan 27, 2022) introduced a change in NetworkLoadBalancedFargateService and NetworkLoadBalancedEc2Service that will may the replacement of a Load Balancing TargetGroup. If you are currently running on a version of <1.141.0 or <2.9.0 and you are upgrading to a newer version of ecs-patterns, this change may cause interruption to your service for a couple of minutes. See below whether this affects you.

Since the change was released, many users have come to depend on the new behavior. There is no way for us to revert to the old behavior without additionally breaking users who are running stacks that have been deployed using a version released after January, 2022.

Therefore, we recommend that if you will be affected by this change, that you put an override in place before performing the upgrade of ecs-patterns from a version before 1.141.0/2.9.0 to a version after.

We are sorry for the disturbance this change has caused. Please know that we take breaking changes seriously and are constantly evolving our checks for them.

Am I affected?

TargetGroups used to be created with default port as 80, irrespective of the containerPort you have configured. A change was introduced that made the TargetGroup's port equal to the containerPort you configure. This leads to the resource replacement.

You are affected if the value for containerPort in your code is different from 80. For example:

const service = new NetworkLoadBalancedFargateService(stack, 'MyFargateService', {
  taskImageOptions: {
    containerPort: 8443,   // <-- you are affected if this value is different from 80
    image: ContainerImage.fromRegistry('amazon/amazon-ecs-sample'),
  }
});

In this example, the value 8443 used to be ignored (the TargetGroup would be created with port 80, which is fine because ECS will register the correct port for each container as it's registering the tasks with the load balancer), but in newer versions of ecs-patterns the value 8443 will be copied onto the TargetGroup's port.

If this property is not defined or is set to 80, your application won't be affected.

Workaround

Use an escape hatch to force the port number of the underlying CfnTargetGroup to 80, regardless of the value of containerPort:

const service = new NetworkLoadBalancedFargateService(stack, 'MyFargateService', {
  taskImageOptions: {
    containerPort: 8443,
    image: ContainerImage.fromRegistry('amazon/amazon-ecs-sample'),
  }
});

const targetGroup = service.targetGroup.node.defaultChild as CfnTargetGroup;
targetGroup.port = 80;

After you have made this change, you can safely upgrade.

jasdrive1 commented 1 year ago

I had some questions about this that I got answered and through I should share with others looking at this issue:

Summary:

It boils down to:


More Context:

How can I be sure I did this correctly?

Running a cdk diff after you upgrade your cdk version and checking that targetgroup.port isn’t modified

How will this issue cause downtime?

When the TG is replaced, the original resource is deleted and then the new one is created. Your service would be unable to receive traffic in that interval because the TG has information on internal IPs of tasks, their ports and protocols, etc. So it’s a necessary blocking construct in between the ALB listener and the service tasks.

you can mentally model it as “all traffic has to flow through the TG to get load balanced correctly”

Why does the issue reference port 80 to be used?

There’s no particular reason that the TG port has to be 80, it appears that it was just used as a placeholder since in CDK, when targets register themselves with a TG they provide the port themselves. In the CFN Docs:

Port
The port on which the targets receive traffic. This port is used unless you specify a port override when registering the target.

CDK self-registering targets always specify a port.Unfortunately, for some reason, Port is a replacement field on a target group CFN resource. So that’s why you need to force it to be 80, to avoid replacement on a new deployment.You can check whether your CFN template will be affected by running cdk diff after upgrading CDK version and checking to be sure that the TargetGroup.Port field is unchanged.

mrgrain commented 1 year ago

Closed as not planned as per the reasons in the issue. We have a CLI notice in place to inform users about this issue, and it can still be found via search.

Unfortunately comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a discussion/issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.