Closed songhohoon closed 7 months ago
@songhohoon from:
Warning BranchENIAnnotationFailed 5m28s (x21 over 49m) vpc-resource-controller failed to annotate pod with branch ENI details: Pod "watch-api-79574c44db-klk7z" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`,`spec.initContainers[*].image`,`spec.activeDeadlineSeconds`,`spec.tolerations` (only additions to existing tolerations),`spec.terminationGracePeriodSeconds` (allow it to be set to 1 if it was previously negative)
it looks like the VPC Resource Controller (https://github.com/aws/amazon-vpc-resource-controller-k8s/blob/master/pkg/provider/branch/provider.go#L385) failed to annotate the pod with a branch ENI.
Based on the error message from the k8s API call, it sounds like this patch operation was blocked. Are you installing any pod validation or admission webhooks in your cluster? Are you running any tools that are modifying the ClusterRole objects installed by EKS? Have you ever had this Security Groups for Pods solution working?
hi. @jdn5126 thanks for reply.
Are you installing any pod validation or admission webhooks in your cluster? -> yes. I installed kyverno and using it for pod validation and mutate some config like prestop hook.
Are you running any tools that are modifying the ClusterRole objects installed by EKS? -> no, I don't.
Have you ever had this Security Groups for Pods solution working? -> yes. I'am using SGP(Security Groups for Pod) for most of my pod.
additional info When this situation occurs, it is usually resolved by observing it and deleting and recreating the pod. However, if I leave it without deleting it, the situation will persist.
@songhohoon Judging from the error message, it seems very likely that the patch operation is being blocked by a pod validation webhook. It is possible that Kyverno is playing that role, but since this is all happening in the control plane and not in the AWS VPC CNI, I think the best path forward is for you to create an AWS support case. Then we can investigate the control plane logs and figure out what is blocking this patching operation from time to time.
@jdn5126
thank you for reply.
I deep dived into the problem and I figured out there is admission controller order.
in my case some of admission controller failed to inject config. after the failed admission controller the pod manifest is not annotatable. so CNI controller cannot annotate the allocated ip address in pod.
the failure is aws api limit exceed. because in my case it was development environment and it was scheduled every morning. so a lot of pods created at a time.
I tried kubectl annotate pods ${pod_name} test=test
and it failed with stuck pod. but succeed with regular pod.
This issue is now closed. Comments on closed issues are hard for our team to see. If you need more assistance, please either tag a team member or open a new issue that references this one.
@songhohoon ah I see, thank you for explaining, and glad you figured it out!
@jdn5126 In README.md, https://github.com/aws/amazon-vpc-cni-k8s/blame/87115cf204dafd148c765ea3c8d184ba73c3a09a/README.md#L498 still mentions:
Setting
ENABLE_POD_ENI
totrue
will allow IPAMD to add thevpc.amazonaws.com/has-trunk-attached
label to the node if the instance has the capacity to attach an additional ENI.
Is this expected?
What happened:
pod stuck in init or container creating status.
Attach logs
sent log file to k8s-awscni-triage@amazon.com with email thdghgns@gmail.com
What you expected to happen: I expected pod created generally
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?:
Environment:
kubectl version
): v1.27.9-eks-5e0fddecat /etc/os-release
): Amazon Linux 2uname -a
): Linux ip-10-8-58-221.ap-northeast-2.compute.internal 5.10.199-190.747.amzn2.x86_64 #1 SMP Sat Nov 4 16:55:14 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux