aws-samples / eks-blueprints-for-proton

MIT No Attribution
28 stars 30 forks source link

AWS Load Balancer controller pods fail to come up (after upgrade) #34

Closed mreferre closed 2 years ago

mreferre commented 2 years ago

Upon cluster upgrade (from 1.21 to 1.22) the AWS LB controller pods fail to start:

[cloudshell-user@ip-10-0-167-172 ~]$ ./kubectl get pods -A 
NAMESPACE            NAME                                           READY   STATUS             RESTARTS         AGE
aws-for-fluent-bit   aws-for-fluent-bit-4kwkw                       1/1     Running            0                43m
aws-for-fluent-bit   aws-for-fluent-bit-78cfp                       1/1     Running            0                43m
aws-for-fluent-bit   aws-for-fluent-bit-z2cc7                       1/1     Running            0                43m
cert-manager         cert-manager-5dbb9d7955-4cq22                  1/1     Running            0                42m
cert-manager         cert-manager-cainjector-7d55bf8f78-zlq59       1/1     Running            0                39m
cert-manager         cert-manager-webhook-577f77586f-qb58k          1/1     Running            0                39m
kube-system          aws-load-balancer-controller-fc78dbc4d-2mvr8   0/1     CrashLoopBackOff   12 (2m33s ago)   39m
kube-system          aws-load-balancer-controller-fc78dbc4d-pt7wn   0/1     CrashLoopBackOff   13 (54s ago)     42m
kube-system          aws-node-478q7                                 1/1     Running            0                43m
kube-system          aws-node-q7jl4                                 1/1     Running            0                43m
kube-system          aws-node-zxxrc                                 1/1     Running            0                43m
kube-system          coredns-657694c6f4-7f7wn                       1/1     Running            0                30m
kube-system          coredns-657694c6f4-szhlb                       1/1     Running            0                30m
kube-system          kube-proxy-pshk7                               1/1     Running            0                30m
kube-system          kube-proxy-svh7g                               1/1     Running            0                30m
kube-system          kube-proxy-zpdvx                               1/1     Running            0                30m
kube-system          metrics-server-694d47d564-28znk                1/1     Running            0                42m
vpa                  vpa-recommender-554f56647b-wcxxc               1/1     Running            0                39m
vpa                  vpa-updater-67d6c5c7cf-n77jl                   1/1     Running            0                39m

The log says:

[cloudshell-user@ip-10-0-167-172 ~]$ ./kubectl logs aws-load-balancer-controller-fc78dbc4d-pt7wn -n kube-system
{"level":"info","ts":1651831011.4426408,"msg":"version","GitVersion":"v2.3.0","GitCommit":"83a8c40061304b67fa19d344e159f43c0b3b7e64","BuildDate":"2021-10-20T21:09:04+0000"}
{"level":"info","ts":1651831011.4769156,"logger":"controller-runtime.metrics","msg":"metrics server is starting to listen","addr":":8080"}
{"level":"error","ts":1651831011.4800823,"logger":"setup","msg":"unable to create controller","controller":"Ingress","error":"the server could not find the requested resource"}
[cloudshell-user@ip-10-0-167-172 ~]$ 

Which appears to be due to a K8s version Vs. AWS LB Controller version mismatch?

This only happens AFTER the upgrade from 1.21 to 1.22. At the initial 1.21 deployment the controllers works just fine.

kcoleman731 commented 2 years ago

v.1.22 fun. We still need to validate all add-ons in core repo.