aws-quickstart / cdk-eks-blueprints

AWS Quick Start Team
Apache License 2.0
446 stars 198 forks source link

els-blueprints: When destroying the whole network resources (VPC, Subnet, Routetable, NACL, SG) are left #1068

Open jesperalmstrom opened 2 weeks ago

jesperalmstrom commented 2 weeks ago

Describe the bug

When calling cdk destroy all or most of the networking resourcing was left un-destroyed.

Expected Behavior

All the resources where destroyed

Current Behavior

Resources like (VPC, Subnet, Routetable, NACL, IGW, NetworkInterfaces, SG). I had to find a Gist that I found to identify and then destroy them manually.

Reproduction Steps

Added a list of Addons

       // AddOns for the cluster.
        const addOns: Array<blueprints.ClusterAddOn> = [
            new blueprints.addons.FluxCDAddOn,
            new blueprints.addons.SSMAgentAddOn,
            new blueprints.addons.ClusterAutoScalerAddOn,
            new blueprints.addons.AwsLoadBalancerControllerAddOn(),
            //new blueprints.addons.VpcCniAddOn(),
            new blueprints.addons.CertManagerAddOn(),
            new blueprints.addons.ExternalDnsAddOn({
                hostedZoneResources: [blueprints.GlobalResources.HostedZone]
            }),
            new blueprints.addons.EfsCsiDriverAddOn({kmsKeys: [kmsKey]}), 
            new blueprints.addons.EbsCsiDriverAddOn(),
            new blueprints.addons.IngressNginxAddOn()
        ];

Then created the cluster:

        const stack = blueprints.EksBlueprint.builder()
            .version('auto')
            .account(account)
            .region(region)
            .clusterProvider(clusterProvider)
            .resourceProvider(blueprints.GlobalResources.Vpc, new blueprints.VpcProvider(undefined, { primaryCidr: envContext.vpcCidr }))
            .resourceProvider(blueprints.GlobalResources.HostedZone, new blueprints.ImportHostedZoneProvider(r53HostedZone.hostedZoneId, hostedZoneName))
            .resourceProvider(blueprints.GlobalResources.KmsKey, new blueprints.CreateKmsKeyProvider())
            .resourceProvider("s3-bucket", new blueprints.CreateS3BucketProvider({
                name: envContext.s3BucketName+'.'+account+'.'+region,
                id: envContext.s3BucketName,
                s3BucketProps: { removalPolicy: RemovalPolicy.DESTROY },
            }))
            .addOns(...addOns)
            .build(this, 'my-eks-blueprint');

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.147.3 (build 32f0fdb)

EKS Blueprints Version

1.15.1

Node.js Version

v22.2.0

Environment details (OS name and version, etc.)

sw_vers ProductName: macOS ProductVersion: 14.5 BuildVersion: 23F79

Other information

No response

shapirov103 commented 2 weeks ago

@jesperalmstrom what is the status of the stack in cloudformation after you destroy it? Sometimes destroy command won't finish leaving resources behind, that happens for example if any of the resources are modified outside of the stack. In that case the CFN detects drift and stops.

jesperalmstrom commented 2 weeks ago

I did a faulty deploy (wrong region) so i destroyed almost immediately. There should not have been any drift.

jesperalmstrom commented 2 weeks ago

The CFN stack got several of these

This resource failed to delete. It was skipped and retained using the Force Delete Stack mode.

When I tried force delete it would not succeed because of dependencies. Took me hours to find and understand all the dependencies. I found a script that I modified to be able to delete them (slight modified version of this https://gist.github.com/alberto-morales/b6d7719763f483185db27289d51f8ec5).

jesperalmstrom commented 1 week ago

@shapirov103 do you have any ideas or tricks?

shapirov103 commented 1 week ago

This is not the expected behavior, however, before qualifying it as a defect, please share what were the dependencies that you discovered? For example, I see that you used Flux. Flux can in turn provision apps that will fail to be removed if flux controller is destroyed, this is especially true for any CRDs from flux which then fail to be cleaned up because no controller is available.

CFN is expected to remove all resources that were provisioned, provided there was no change to the resources.

jesperalmstrom commented 2 days ago

Thanks for the response I will try to remove Flux and see if the delete becomes more smooth.