aws / aws-cdk

The AWS Cloud Development Kit is a framework for defining cloud infrastructure in code
https://aws.amazon.com/cdk
Apache License 2.0
11.71k stars 3.94k forks source link

(aws-ecs): Deleting a stack with a Cluster and an ASG capacity provider fails #19275

Open gshpychka opened 2 years ago

gshpychka commented 2 years ago

What is the problem?

If a stack contains a ECS cluster with an ASG capacity provider, deleting the stack will fail due to an apparent circular dependency.

Reproduction Steps

Minimal stack:

class ClusterWithCapacity(cdk.Stack):
    def __init__(
        self,
        scope: Construct,
        id: str,
        **kwargs,
    ) -> None:
        super().__init__(scope, id, **kwargs)
        vpc = ec2.Vpc(self, "vpc")
        ecs_cluster = ecs.Cluster(self, "cluster", vpc=vpc)

        asg = autoscaling.AutoScalingGroup(
            self,
            "asg",
            vpc=vpc,
            instance_type=ec2.InstanceType.of("t3-small"),
            machine_image=ecs.EcsOptimizedImage.amazon_linux2(),
        )

        capacity_provider = ecs.AsgCapacityProvider(
            self,
            "capacity_provider",
            auto_scaling_group=asg,
        )

        ecs_cluster.add_asg_capacity_provider(provider=capacity_provider)

Deploy the stack and try deleting it.

What did you expect to happen?

The stack is deleted.

What actually happened?

The capacity provider fails to delete with the following error: Resource handler returned message: "Invalid request provided: Cannot delete a capacity provider while it is associated with a cluster."

CDK CLI Version

2.15.0

Framework Version

No response

Node.js Version

17.6.0

OS

MacOS

Language

Python

Language Version

No response

Other information

This hints that the cluster needs to be deleted first, but the capacity provider actually depends on the cluster.

Creating an explicit dependency ecs_cluster.node.add_dependency(capacity_provider) results in a circular dependency.

gshpychka commented 2 years ago

Okay, it seems that the issue is that the AWS::ECS::ClusterCapacityProviderAssociations resource is not deleted first.

gshpychka commented 2 years ago

After unsuccessfully trying to access the CfnClusterCapacityProviderAssociations resource with an escape hatch, I see that it's added with an aspect here: https://github.com/aws/aws-cdk/blob/5de7b86d916be6ab892e75e18c54a327fe1f65ff/packages/%40aws-cdk/aws-ecs/lib/cluster.ts#L1152:L1176

How can I access it to declare an explicit dependency before the bug is fixed?

The bug seems to be that this resource is removed AFTER the capacity provider, i.e. the required dependency is not being created

gshpychka commented 2 years ago

Adding a dependency from CfnClusterCapacityProviderAssociations is not enough, as it still fails to delete before all the services are deleted (and the capacity provider is not in use).

We need to make every service depend on the association as well.

This is not apparent in the stack above, since there are no services in it.

Actually, the example above might not be enough, might have to add a service that would use the capacity provider for the issue to appear.

I managed to work around this by using Aspects and adding the required dependencies there.

gshpychka commented 2 years ago

My fix. Adding it to the parent stack, works with multiple clusters.

@jsii.implements(cdk.IAspect)
class HotfixCapacityProviderDependencies:
    # Add a dependency from capacity provider association to the cluster
    # and from each service to the capacity provider association
    def visit(self, node: IConstruct) -> None:
        if type(node) is ecs.Ec2Service:
            children = node.cluster.node.find_all()
            for child in children:
                if type(child) is ecs.CfnClusterCapacityProviderAssociations:
                    child.node.add_dependency(node.cluster)
                    node.node.add_dependency(child)
lucasgherculano commented 2 years ago

Very handy solution, can someone provide the same for typescript?

peterwoodworth commented 2 years ago

@lucasgherculano check out our docs page on escape hatches to see what's going on in that solution, with some TypeScript examples

peterwoodworth commented 2 years ago

Check out this comment for a relevant example in TS 🙂

gshpychka commented 2 years ago

@peterwoodworth what would be the fix for this? Add the above aspect to the root stack?

peterwoodworth commented 2 years ago

I'm too unfamiliar with this issue to comment; I won't have time to get to this soon. Maybe @madeline-k would know though

jamesmcglinn commented 2 years ago

@lucasgherculano the Typescript version of @gshpychka's solution:

/**
 * Add a dependency from capacity provider association to the cluster
 * and from each service to the capacity provider association.
 */
class CapacityProviderDependencyAspect implements IAspect {
  public visit(node: IConstruct): void {
    if (node instanceof ecs.Ec2Service) {
      const children = node.cluster.node.findAll();
      for (const child of children) {
        if (child instanceof ecs.CfnClusterCapacityProviderAssociations) {
          child.node.addDependency(node.cluster);
          node.node.addDependency(child);
        }
      }
    }
  }
}

Then add it in your root stack's constructor with:

Aspects.of(this).add(new CapacityProviderDependencyAspect());

matthias-pichler-shopstory commented 1 year ago

@lucasgherculano the Typescript version of @gshpychka's solution:

/**
 * Add a dependency from capacity provider association to the cluster
 * and from each service to the capacity provider association.
 */
class CapacityProviderDependencyAspect implements IAspect {
  public visit(node: IConstruct): void {
    if (node instanceof ecs.Ec2Service) {
      const children = node.cluster.node.findAll();
      for (const child of children) {
        if (child instanceof ecs.CfnClusterCapacityProviderAssociations) {
          child.node.addDependency(node.cluster);
          node.node.addDependency(child);
        }
      }
    }
  }
}

Then add it in your root stack's constructor with:

Aspects.of(this).add(new CapacityProviderDependencyAspect());

I tried this but since the MaybeCreateCapacityProviderAssociations are created using an Aspect themselves when my aspect ran the child wasn't created yet. Maybe this is fixable when declaring your aspect after creating the aws_ecs.Cluster. A potentially simpler solution might be to simply retain the associations:

export class CapacityProviderDependency implements IAspect {
  public visit(node: IConstruct): void {
    if (!(node instanceof aws_ecs.CfnClusterCapacityProviderAssociations)) {
      return;
    }

    node.applyRemovalPolicy(RemovalPolicy.RETAIN);
  }
}