aws-quickstart / cdk-eks-blueprints

AWS Quick Start Team
Apache License 2.0
446 stars 199 forks source link

ingress-nginx addon: addon not waiting for nginx to be fully deployed #1050

Open idchrisamzn opened 1 month ago

idchrisamzn commented 1 month ago

Describe the bug

We’re using the ingress-nginx addon to setup ingress. However after this some of our addons create ingress objects which rely on the ingress-nginx addon being there. When deploying we get errors similar to:

failed: Error: The stack named chris failed to deploy: CREATE_FAILED (The following resource(s) failed to create: [ciliumaddon]. ): Received response status [FAILED] from custom resource. Message returned: Error: b'Release "cilium" does not exist. Installing it now.\nError: 1 error occurred:\n\t* Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "[https://nginx-ingress-controller-ingress-nginx-controller-admission.kube-system.svc:443/networking/v1/ingresses?timeout=30s](https://nginx-ingress-controller-ingress-nginx-controller-admission.kube-system.svc/networking/v1/ingresses?timeout=30s)": no endpoints available for service "nginx-ingress-controller-ingress-nginx-controller-admission"\n\n\n'

I’ve tried a few things: Ordering so that the ingress-nginx addon is defined before the addons with ingress Annotated the deployment inside those ingress-using addons with:

@blueprints.utils.dependable(blueprints.IngressNginxAddOn.name)

It looks like blueprints just isn’t waiting for the nginx pod to be running before moving on the subsequent addons. With our own custom addons we use the “wait” parameter from helm-addon: https://github.com/aws-quickstart/cdk-eks-blueprints/blob/b8c51d230fa548d39696121cd68a7c0228f90665/lib/addons/helm-addon/index.ts#L84

However this doesn’t seem to be available in IngressNginxAddOnProps. Do you have any suggestions of how to resolve this?

Expected Behavior

Subsequent addons with ingress objects deploy without error

Current Behavior

failed: Error: The stack named chris failed to deploy: CREATE_FAILED (The following resource(s) failed to create: [ciliumaddon]. ): Received response status [FAILED] from custom resource. Message returned: Error: b'Release "cilium" does not exist. Installing it now.\nError: 1 error occurred:\n\t* Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": failed to call webhook: Post "[https://nginx-ingress-controller-ingress-nginx-controller-admission.kube-system.svc:443/networking/v1/ingresses?timeout=30s](https://nginx-ingress-controller-ingress-nginx-controller-admission.kube-system.svc/networking/v1/ingresses?timeout=30s)": no endpoints available for service "nginx-ingress-controller-ingress-nginx-controller-admission"\n\n\n'

Reproduction Steps

Deploy ingress-nginx addon, and another addon with an ingress at the same time

Possible Solution

Add the "wait" condition to the ingress-nginx addon

Additional Information/Context

No response

CDK CLI Version

2.132.0 (build 9a51c89)

EKS Blueprints Version

1.15.1

Node.js Version

v20.15.1

Environment details (OS name and version, etc.)

Ubuntu 22.04.4 LTS

Other information

No response

shapirov103 commented 1 month ago

Implement refactoring all hem addons to support the wait option as the one-off solution won't solve similar cases for the rest of the addons.