aws-quickstart / cdk-eks-blueprints

AWS Quick Start Team
Apache License 2.0
454 stars 205 forks source link

EfsCsiDriverAddOn: When using a namespace it fails with "Error from server (NotFound): error when creating "/tmp/manifest.yaml": namespaces "XXXX" not found\n' #1077

Closed jesperalmstrom closed 2 weeks ago

jesperalmstrom commented 1 month ago

Describe the bug

When trying to add a namespace to EfsCsiDriverAddOn({namespace: nameSpace}) it fails with the following error:

from custom resource. Message returned: Error: b'Error from server (NotFound):
error when creating "/tmp/manifest.yaml": namespaces "XXXX" not found\n'

To try to fix this I added a Team with the namespace "XXXX" const teamUepe = new blueprints.PlatformTeam({ name: nameSpace }); Then add the team before addons:

            .teams(teamUepe)
            .addOns(...addOns)

Still getting the same error.

Expected Behavior

EfsCsiDriverAddOn should be created after the namespace is created or add support for createNamespace: true

Current Behavior

Fails with error message as seen above

Reproduction Steps

        const teamUepe = new blueprints.PlatformTeam({ name: nameSpace });
...
       const addOns: Array<blueprints.ClusterAddOn> = [
            new blueprints.addons.SSMAgentAddOn,
            new blueprints.addons.ClusterAutoScalerAddOn,
            new blueprints.addons.EfsCsiDriverAddOn({ namespace: nameSpace, kmsKeys: [kmsEfsKey] }), 
            new blueprints.addons.AwsLoadBalancerControllerAddOn({
                namespace: nameSpace,
            }),
            new blueprints.addons.ExternalDnsAddOn({
                namespace: nameSpace,
                hostedZoneResources: [blueprints.GlobalResources.HostedZone]
            }),
            new blueprints.addons.CertManagerAddOn({ installCRDs: true }),
            new blueprints.addons.ExternalsSecretsAddOn(),
            new blueprints.addons.SecretsStoreAddOn(),
            new blueprints.addons.IngressNginxAddOn(ingressNginxProps),
        ];
const stack = blueprints.EksBlueprint.builder()
            .version(KubernetesVersion.V1_30)
            .account(this.account)
            .region(this.region)
            .clusterProvider(clusterProvider)
            .resourceProvider(blueprints.GlobalResources.Vpc, new blueprints.VpcProvider(undefined, { primaryCidr: envContext.vpcCidr }))
            .resourceProvider(blueprints.GlobalResources.HostedZone, new blueprints.ImportHostedZoneProvider(hostedZone.hostedZoneId, hostedZoneName))
            .resourceProvider(blueprints.GlobalResources.Certificate, new blueprints.CreateCertificateProvider('secure-ingress-cert', `*.${hostedZone.zoneName}`, blueprints.GlobalResources.HostedZone))
            .resourceProvider(kmsEfsKeyName, new blueprints.CreateKmsKeyProvider(kmsEfsKeyName))
            .resourceProvider('uepe-efs', new blueprints.CreateEfsFileSystemProvider({
                name: envContext.efsName,
                kmsKeyResourceName: kmsEfsKeyName,
                efsProps: {
                    encrypted: true,
                    lifecyclePolicy: efs.LifecyclePolicy.AFTER_7_DAYS,
                    removalPolicy: RemovalPolicy.DESTROY,
                    throughputMode: efs.ThroughputMode.BURSTING,
                    fileSystemPolicy: eksFileSystemPolicy,
                },
            }))
            .teams(teamUepe)
            .addOns(...addOns)
            .build(this, 'eks-blueprints');

Possible Solution

EfsCsiDriverAddOn should be created after the namespace is created in or add support for createNamespace: true

Additional Information/Context

No response

CDK CLI Version

2.156.0 ("aws-cdk-lib": "2.147.3")

EKS Blueprints Version

1.15.1

Node.js Version

v22.8.0

Environment details (OS name and version, etc.)

MacOs Sequoia 15.0

Other information

No response

jesperalmstrom commented 1 month ago

When digging in the Lambda logs I did find this create request that is related to the error:


{
    "RequestType": "Create",
    "ServiceToken": "arn:aws:lambda:us-east-1:xxx:function:EksUepeStackuepeeks1FE415-ProviderframeworkonEvent-0BukR7bjwpgv",
    "ResponseURL": "...",
    "StackId": "arn:aws:cloudformation:us-east-1:xxx:stack/EksUepeStackuepeeks1FE4154B/185aea00-764f-11ef-b109-0e04891abfad",
    "RequestId": "bd2952a9-043a-4155-8161-48e55b06ffb3",
    "LogicalResourceId": "uepeeksefscsicontrollersamanifestefscsicontrollersaServiceAccountResourceF3B501C0",
    "ResourceType": "Custom::AWSCDK-EKS-KubernetesResource",
    "ResourceProperties": {
        "ServiceToken": "arn:aws:lambda:us-east-1:xxx:function:EksUepeStackuepeeks1FE415-ProviderframeworkonEvent-0BukR7bjwpgv",
        "PruneLabel": "aws.cdk.eks/prune-c8f7f1aa13d9beee65d83d39aea548228bf9faadef",
        "ClusterName": "uepe-eks",
        "Manifest": "[{\"apiVersion\":\"v1\",\"kind\":\"ServiceAccount\",\"metadata\":{\"name\":\"efs-csi-controller-sa\",\"namespace\":\"XXX\",\"labels\":{\"aws.cdk.eks/prune-c8f7f1aa13d9beee65d83d39aea548228bf9faadef\":\"\",\"app.kubernetes.io/name\":\"efs-csi-controller-sa\"},\"annotations\":{\"eks.amazonaws.com/role-arn\":\"arn:aws:iam::xxx:role/EksUepeStackuepeeks1FE415-uepeeksefscsicontrollersa-1Lp0t66HvgLq\"}}}]",
        "RoleArn": "arn:aws:iam::xxx:role/EksUepeStackuepeeks1FE415-uepeeksCreationRole088278-GOyKG17AEzI0"
    }
}
jesperalmstrom commented 1 month ago

I think this is some timing issue. Seemingly randomly I get this error when trying to use another namespace than the default one for several add-ons. I have got the issue here: AwsLoadBalancerControllerAddOn({namespace: nameSpace})

jesperalmstrom commented 1 month ago

This seems to be a timing issue. When I run the cdk deploy again it usually works. What can be done to wait or guarantee that the namespace is created before it is used by Addon?

shapirov103 commented 1 month ago

@jesperalmstrom You used a team construct to create namespace, then you used addon with that namespace. The current architecture is such that dependency graph is going unidirectionally from teams to addons, not the other way around. To create a separate namespace for the efs addon you will need to create a separate tiny addon to just create the namespace. The addon needs to be annotated with @Reflect.metadata("ordered", true) and be registered before the efscsi driver addon.

Pseudo code:

@Reflect.metadata("ordered", true)
@supportsX86
export class MyNamespaceAddOn implements ClusterAddOn {

  deploy(clusterInfo: ClusterInfo): Promise<Construct> {
        const cluster = clusterInfo.cluster;
        return Promise.resolve(createNamespace(myNamespaceName, cluster));
   }
}

I would still keep this open as efs driver should be able to support createNamespace flag like we do elsewhere.

jesperalmstrom commented 1 month ago

I will try this, i have my doubts since the reason that i added teams() in the first place was because my interpretation of the error i got was that create NS was not supported.

jesperalmstrom commented 1 month ago

I have tested now without the .teams(nameSpace) and still get the same issue (just before the error in CW log):

...
"ResourceProperties": {
        "ServiceToken": "arn:aws:lambda:us-east-1:211125531019:function:testEksStack-ProviderframeworkonEvent-uYCAE3ipZRup",
        "PruneLabel": "aws.cdk.eks/prune-c8deef32ebxxxx",
        "ClusterName": "test",
        "Manifest": "[{\"apiVersion\":\"v1\",\"kind\":\"ServiceAccount\",\"metadata\":{\"name\":\"efs-csi-controller-sa\",\"namespace\":\"XXXX\",\"labels\":{\"aws.cdk.eks/prune-xxxx\":\"\",\"app.kubernetes.io/name\":\"efs-csi-controller-sa\"},\"annotations\":{\"eks.amazonaws.com/role-arn\":\"arn:aws:iam::211125531019:role/testEksStack-B9y\"}}}]",
...

Error message:

[ERROR] Exception: b'Error from server (NotFound): error when creating "/tmp/manifest.yaml": namespaces "XXX" not found\n'
Traceback (most recent call last):
  File "/var/task/index.py", line 14, in handler
    return apply_handler(event, context)
  File "/var/task/apply/__init__.py", line 64, in apply_handler
    kubectl('create', manifest_file, *kubectl_opts)
  File "/var/task/apply/__init__.py", line 91, in kubectl
    raise Exception(output)
jesperalmstrom commented 1 month ago

After adding the NamespaceAddOn() suggested by you @shapirov103 (thanks again) I managed to deploy the stack. My conclusion is that EFS Driver AddOn does not have a working logic for createNamespace

shapirov103 commented 1 month ago

@jesperalmstrom You are correct, the createNamespace logic is missing from the EFS addon as by default the addon is installed in kube-system. Let's keep it open to address the issue.

jesperalmstrom commented 1 month ago

@shapirov103 i have my suspicions that the AwsLoadBalancerControllerAddOn() could have the same issue.