Closed AbeOwlu closed 5 months ago
@ AbeOwlu It looks like we're running in to API server access issues on these nodes. Can you try accessing the API server from the problematic nodes? We should see an issue even with CNI pods. Also, please check the status of kube-proxy
pods on these nodes..
Thanks @achevuru .We can go ahead and close this. The network manager on the AL23 based AMI built some weird network configurations, net forwarding was switched off etc.
Ok, thanks for the update.
What happened: node agent fails to start on a new node - and it shows up as a widespread failure on the node with all pods failing to get a network built for pod sandboxes
Attach logs
What you expected to happen: normal startup... or failed node agent container should not lead to failed aws-node pod causing CNI issues
How to reproduce it (as minimally and precisely as possible): Unable to reproduce - occurred only on 3 nodes on impacted cluster
Anything else we need to know?: should node-agent be an opt in for cluster not using networkpolicy resources?
Environment: as provided with custom networking enabled
kubectl version
): 1.29cat /etc/os-release
): RHEL (enterprise linux 8)uname -a
): 4.18.0-513.18.1.el8_9.x86_64