aws-quickstart / cdk-eks-blueprints

AWS Quick Start Team
Apache License 2.0
460 stars 207 forks source link

keda: cdk destroy fails when no custom irsaRoles are specified. #1082

Closed PeterKoegel closed 1 month ago

PeterKoegel commented 2 months ago

Describe the bug

When the keda addon is included in my cluster then cdk destroy runs into a timeout of one hour because the keda-namespace-struct cannot be destroyed and I get following error messages.

I can work around this issue by configuring irsaRoles for the plugin where it does not matter which role I choose e.g.:

new blueprints.addons.KedaAddOn({irsaRoles:["AmazonSQSFullAccess"]})

In this case cdk destroy runs successfully.

I think this is because when the list of irsaRoles is > 0 then in cdk-eks-blueprints/lib/addons/keda/index.ts line 90 another branch is executed that creates the keda namespace from cdk instead of helm.

What I want is to successfully run cdk destroy without specifying any irsaRoles.

Expected Behavior

cdk destroy should destroy everything.

Current Behavior

When no list of irsaRoles is specified then cdk destroy hangs at a certain point and aborts with a timeout after an hour.

49 Currently in progress: kedanamespacestruct22C0E683, amazoncloudwatchnamespacestruct448594E6, externalsecretsnamespacestructCB434E5B kubernetes-executor | 49 | 12:49:55 | DELETE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | keda-namespace-struct/Resource/Default (kedanamespacestruct22C0E683) CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [8fd952a4-9aed-4cdd-af82-15abefbb8c70]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version. 49 Currently in progress: amazoncloudwatchnamespacestruct448594E6, externalsecretsnamespacestructCB434E5B kubernetes-executor | 49 | 12:50:41 | DELETE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | amazon-cloudwatch-namespace-struct/Resource/Default (amazoncloudwatchnamespacestruct448594E6) CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [7d82fb55-b588-4191-be49-1389fba7e3ed]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version. kubernetes-executor | 49 | 12:50:51 | DELETE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | external-secrets-namespace-struct/Resource/Default (externalsecretsnamespacestructCB434E5B) CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [72ffbd04-5b56-4e3d-a028-2752463aaed5]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version. kubernetes-executor | 49 | 12:50:52 | DELETE_FAILED | AWS::CloudFormation::Stack | kubernetes-executor The following resource(s) failed to delete: [amazoncloudwatchnamespacestruct448594E6, kedanamespacestruct22C0E683, externalsecretsnamespacestructCB434E5B].

Failed resources: kubernetes-executor | 12:49:55 | DELETE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | keda-namespace-struct/Resource/Default (kedanamespacestruct22C0E683) CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [8fd952a4-9aed-4cdd-af82-15abefbb8c70]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version. kubernetes-executor | 12:50:41 | DELETE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | amazon-cloudwatch-namespace-struct/Resource/Default (amazoncloudwatchnamespacestruct448594E6) CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [7d82fb55-b588-4191-be49-1389fba7e3ed]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version. kubernetes-executor | 12:50:51 | DELETE_FAILED | Custom::AWSCDK-EKS-KubernetesResource | external-secrets-namespace-struct/Resource/Default (externalsecretsnamespacestructCB434E5B) CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [72ffbd04-5b56-4e3d-a028-2752463aaed5]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version.

Reproduction Steps

Add keda addon to the configuration without specifying irsaRoles in the addon config.

Possible Solution

No response

Additional Information/Context

No response

CDK CLI Version

2.144.0 (build 5fb15bc)

EKS Blueprints Version

1.15.1

Node.js Version

v20.11.1

Environment details (OS name and version, etc.)

Cluster with Linux + Windows nodegroup

Other information

No response

PeterKoegel commented 2 months ago

Issue behaves as also described in https://github.com/aws-quickstart/cdk-eks-blueprints/issues/980

shapirov103 commented 2 months ago

@PeterKoegel acknowledged, appears to be a defect. We add Keda in the end to end test but using with irsa indeed. Is your test simply adding the addon with no irsa policies to an arbitrary blueprint, then destroying?

PeterKoegel commented 2 months ago

No, I would need to do some adaptions to the blueprints (predefined vpc subnets ...) to get them running in my environment so I only analyzed this in a custom configuration.

PeterKoegel commented 1 month ago

When trying to reproduce this issue in a simple example I noticed that the described behaviour only occurs when there are other ressources created in the same stack that have a dependency defined to the cluster:

import 'source-map-support/register';
import * as cdk from 'aws-cdk-lib';
import * as blueprints from '@aws-quickstart/eks-blueprints';
import * as s3 from 'aws-cdk-lib/aws-s3';
import { RemovalPolicy } from 'aws-cdk-lib';

const app = new cdk.App();

// AddOns for the cluster.
const addOns: Array<blueprints.ClusterAddOn> = [
    new blueprints.addons.CoreDnsAddOn(),
    new blueprints.addons.KubeProxyAddOn(),
    new blueprints.addons.VpcCniAddOn(),
    new blueprints.addons.KedaAddOn()
];

const account = "12345";
const region = "eu-central-1";
const vpcId = "12345";

const stack = blueprints.EksBlueprint.builder()
    .account(account)
    .region(region)
    .version(cdk.aws_eks.KubernetesVersion.V1_30)
    .resourceProvider(blueprints.GlobalResources.Vpc, new blueprints.VpcProvider(vpcId))
    .addOns(...addOns)
    .build(app, 'eks-blueprint-ipv4');

// Some resource that depends on the EKS cluster
// (e.g. could be a lambda function that depends on the cluster instead the S3 bucket which is only used as a placeholder)
const cluster = stack.getClusterInfo().cluster;
const exampleressource = new s3.Bucket(stack, 'TestBucket', {
    removalPolicy: RemovalPolicy.DESTROY,
    autoDeleteObjects: true,
  });

exampleressource.node.addDependency(cluster);

cdk destroy results in the following error:

eks-blueprint-ipv4 |  14 | 14:21:27 | DELETE_FAILED        | Custom::AWSCDK-EKS-KubernetesResource | keda-namespace-struct/Resource/Default (kedanamespacestruct22C0E683) CloudFormation did not receive a response from your Custom Resource. Please check your logs for requestId [24bd4710-c722-432e-b78a-db9a49b8fbda]. If you are using the Python cfn-response module, you may need to update your Lambda function code so that CloudFormation can attach the updated version.
eks-blueprint-ipv4 |  14 | 14:21:27 | DELETE_FAILED        | AWS::CloudFormation::Stack            | eks-blueprint-ipv4 The following resource(s) failed to delete: [kedanamespacestruct22C0E683].