aws / aws-application-networking-k8s

A Kubernetes controller for Amazon VPC Lattice
https://www.gateway-api-controller.eks.aws.dev/
Apache License 2.0
162 stars 47 forks source link

Controller should fail fast(er). #660

Open eviln1 opened 1 month ago

eviln1 commented 1 month ago

Our team got hit by https://github.com/aws/aws-application-networking-k8s/issues/658 today. The proposal in https://github.com/aws/aws-application-networking-k8s/issues/659 would help a lot.

Additionally, I think that the controller should fail fast(er).

We use helm to install the controller, with the atomic: true option set; the rationale is that if the pods can't become ready, helm rolls back to the previous release.

Currently, the controller will become ready, but fail after a couple of minutes and go into CrashLoopBackOff.

Having the controller check for pre-requisites before becoming ready would prevent this behavior.

zijun726911 commented 4 weeks ago

Thank you for raising this issue. it make sense to make the controller fail faster in its initialization phase if some required resources does not installed in the cluster, and to make helm chart atomic: true behavior work as intended.

We will look into implementing this fast-fail logic, it's not quite hard to add it.

Thank you again for your input, and we appreciate your patience as we work on improving the controller.