Closed GoGoPenguin closed 2 years ago
Hi, I went through the logs but I couldn't locate the sandbox container you mentioned in the issue. However, I did find similar scenarios for a number of sandbox containers.
It seems like the pattern of the error occurrence is of the following form in ipamd.log:
Initially there in an AddNetworkRequest received by IPAMD
{"level":"info","ts":"2021-12-10xxx","caller":"rpc/rpc.pb.go:486","msg":"Received AddNetwork for NS /proc/xxx/ns/net, Sandbox 7c7xxx, ifname xxx"}
{"level":"debug","ts":"2021-12-10xxx","caller":"rpc/rpc.pb.go:486","msg":"AddNetworkRequest: ClientVersion:\"v1.7.5\" K8S_POD_NAME:\"xxx" K8S_POD_NAMESPACE:\"xxx" K8S_POD_INFRA_CONTAINER_ID:\"xxx" ContainerID:\"7c7xxx" IfName:\"xxx\" NetworkName:\"aws-cni\" Netns:\"/proc/xxx/ns/net\" "}
The enis don't have available addresses to assign to the sandbox container
{"level":"debug","ts":"2021-12-10xxx","caller":"ipamd/rpc_handler.go:142","msg":"AssignIPv4Address: IP address pool stats: total: 15, assigned 14"}
{"level":"debug","ts":"2021-12-10xxx","caller":"ipamd/rpc_handler.go:142","msg":"AssignPodIPv4Address: ENI eni-027xxx does not have available addresses"}
{"level":"debug","ts":"2021-12-10xxx","caller":"ipamd/rpc_handler.go:142","msg":"AssignPodIPv4Address: ENI eni-090xxx does not have available addresses"}
{"level":"debug","ts":"2021-12-10xxx","caller":"ipamd/rpc_handler.go:142","msg":"AssignPodIPv4Address: ENI eni-0d5xxx does not have available addresses"}
No available IP addresses error message sent in AddNetworkReply as shown below
{"level":"error","ts":"2021-12-10xxx","caller":"ipamd/rpc_handler.go:142","msg":"DataStore has no available IP addresses"}
{"level":"debug","ts":"2021-12-10xxx","caller":"rpc/rpc.pb.go:486","msg":"VPC CIDR 192.xx.0.0/xx"}
{"level":"info","ts":"2021-12-10xxx","caller":"rpc/rpc.pb.go:486","msg":"Send AddNetworkReply: IPv4Addr , DeviceNumber: -1, err: assignPodIPv4AddressUnsafe: no available IP addresses"}
Next an DelNetworkRequest received by IPAMD with reason "PodDeleted". However, the pod was never assigned IP Address because of the error above, because of which the DelNetworkRequest should fail
{"level":"info","ts":"2021-12-10xxx","caller":"rpc/rpc.pb.go:504","msg":"Received DelNetwork for Sandbox 7c7xxx"}
{"level":"debug","ts":"2021-12-10xxx","caller":"rpc/rpc.pb.go:504","msg":"DelNetworkRequest: ClientVersion:\"v1.7.5\" K8S_POD_NAME:\"xxx\" K8S_POD_NAMESPACE:\"xxx" K8S_POD_INFRA_CONTAINER_ID:\"7c7xxx\" Reason:\"PodDeleted\" ContainerID:\"7c7xxx" IfName:\"eth0\" NetworkName:\"aws-cni\" "}
{"level":"debug","ts":"2021-12-10xxx","caller":"ipamd/rpc_handler.go:221","msg":"UnassignPodIPv4Address: IP address pool stats: total:15, assigned 14, sandbox aws-cni/7c7xxx/eth0"}
{"level":"debug","ts":"2021-12-10xxx","caller":"ipamd/rpc_handler.go:221","msg":"UnassignPodIPv4Address: Failed to find IPAM entry under full key, trying CRI-migrated version"}
{"level":"warn","ts":"2021-12-10xxx","caller":"ipamd/rpc_handler.go:221","msg":"UnassignPodIPv4Address: Failed to find sandbox _migrated-from-cri/7c7xxx/unknown"}
{"level":"info","ts":"2021-12-10xxx","caller":"rpc/rpc.pb.go:504","msg":"Send DelNetworkReply: IPv4Addr , DeviceNumber: 0, err: datastore: unknown pod"}
The above DelNetworkRequest is received more than one time because of retries.
I'm looking into the reason as to why the enis does not have available addresses
@Shreya027 Thanks for quick response.
@GoGoPenguin -
Instance type is t3.medium
which supports 3 ENIs and 5 secondary IPs so 15 IPs will be available.
Based on the state file - 192.168.68.129
is available hence the log line -
{"level":"debug","ts":"2021-12-11T10:11:49.740Z","caller":"ipamd/ipamd.go:2057",
"msg":"IP pool stats: total = 15, used = 14,
IPs in Cooldown = 0, c.maxIPsPerENI = 5"}
Additional ENIs cannot be added since max of 3 ENIs are reached.
192.168.68.129
was last freed at 2021-12-10T15:50:55.701Z
-
{"level":"info","ts":"2021-12-10T15:50:55.701Z","caller":"ipamd/rpc_handler.go:220","msg":"UnassignPodIPAddress: sandbox aws-cni/a70537939fce4d23a2ce65259f6a689438f75439045a879295fa93046d978458/eth0's ipAddr 192.168.68.129, DeviceNumber 2"}
{"level":"info","ts":"2021-12-10T15:50:55.701Z","caller":"rpc/rpc.pb.go:731","msg":"Send DelNetworkReply: IPv4Addr 192.168.68.129, DeviceNumber: 2, err: <nil>"}
{"level":"debug","ts":"2021-12-10T15:51:00.331Z","caller":"ipamd/ipamd.go:2057","msg":"IP pool stats: total = 15, used = 13, IPs in Cooldown = 2, c.maxIPsPerENI = 5"}
At this time 2 IPs are in cool down - Cooldown = 2
Out of the 2 IPs, one IP [192.168.85.235
] was out of the cool down around 2021-12-10T15:51:16.676Z
and got assigned to a pod -
plugin logs -
{"level":"info","ts":"2021-12-10T15:51:16.673Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"Received CNI add request: ContainerID(1a9bda7adf0eb0ddadca3cb0e40e806b558689a5f6efd972d6aee6ff4100f7ab) Netns(/proc/18799/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=cityfarm-development-7d8c4b764b-t6zcw;K8S_POD_INFRA_CONTAINER_ID=1a9bda7adf0eb0ddadca3cb0e40e806b558689a5f6efd972d6aee6ff4100f7ab) Path(/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.3.1\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"DEBUG\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"debug","ts":"2021-12-10T15:51:16.674Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"MTU value set is 9001:"}
{"level":"info","ts":"2021-12-10T15:51:16.679Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"Received add network response for container 1a9bda7adf0eb0ddadca3cb0e40e806b558689a5f6efd972d6aee6ff4100f7ab interface eth0: Success:true IPv4Addr:\"192.168.85.235\" DeviceNumber:2 VPCv4CIDRs:\"192.168.0.0/16\""}
{"level":"debug","ts":"2021-12-10T15:51:16.679Z","caller":"routed-eni-cni-plugin/cni.go:205","msg":"SetupNS: hostVethName=eni6a0b1b8b13c, contVethName=eth0, netnsPath=/proc/18799/ns/net, deviceNumber=2, mtu=9001"}
{"level":"debug","ts":"2021-12-10T15:51:16.679Z","caller":"driver/driver.go:280","msg":"v4addr: 192.168.85.235/32; v6Addr: <nil>\n"}
IPAMD logs -
{"level":"debug","ts":"2021-12-10T15:51:16.676Z","caller":"datastore/data_store.go:757","msg":"Returning Free IP 192.168.85.235"}
{"level":"debug","ts":"2021-12-10T15:51:16.676Z","caller":"datastore/data_store.go:680","msg":"New IP from CIDR pool- 192.168.85.235"}
{"level":"info","ts":"2021-12-10T15:51:16.676Z","caller":"datastore/data_store.go:784","msg":"AssignPodIPv4Address: Assign IP 192.168.85.235 to sandbox aws-cni/1a9bda7adf0eb0ddadca3cb0e40e806b558689a5f6efd972d6aee6ff4100f7ab/eth0"}
{"level":"debug","ts":"2021-12-10T15:51:16.677Z","caller":"rpc/rpc.pb.go:713","msg":"VPC CIDR 192.168.0.0/16"}
{"level":"info","ts":"2021-12-10T15:51:16.677Z","caller":"rpc/rpc.pb.go:713","msg":"Send AddNetworkReply: IPv4Addr 192.168.85.235, IPv6Addr: , DeviceNumber: 2, err: <nil>"}
{"level":"debug","ts":"2021-12-10T15:51:20.338Z","caller":"ipamd/ipamd.go:2057","msg":"IP pool stats: total = 15, used = 14, IPs in Cooldown = 1, c.maxIPsPerENI = 5"}
{"level":"debug","ts":"2021-12-10T15:51:25.343Z","caller":"ipamd/ipamd.go:2057","msg":"IP pool stats: total = 15, used = 14, IPs in Cooldown = 1, c.maxIPsPerENI = 5"}
At around 2021-12-10T15:51:30.348Z
even 192.168.68.129
is out of cool down.
{"level":"debug","ts":"2021-12-10T15:51:30.348Z","caller":"ipamd/ipamd.go:2057","msg":"IP pool stats: total = 15, used = 14, IPs in Cooldown = 0, c.maxIPsPerENI = 5"}
But I don't see any more ADD requests after 2021-12-10T15:51:16.676Z
hence 192.168.68.129
never got assigned to any pod. Can you retry scheduling the pod?
@jayanthvn
Okay, I tried to redeploy my pod. I have sent the log file to aws-security@amazon.com
kubectl rollout restart deployment cityfarm-development
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14s default-scheduler Successfully assigned default/cityfarm-development-7997fb8976-gtgxp to ip-192-168-81-85.ap-northeast-2.compute.internal
Warning FailedCreatePodSandBox 12s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "ebb530230479f5cbbb6c2cbebadd117a09f954112064bb2ea7e7db3a7ae0a6ce" network for pod "cityfarm-development-7997fb8976-gtgxp": networkPlugin cni failed to set up pod "cityfarm-development-7997fb8976-gtgxp_default" network: add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 11s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "7bf9e05ac04f8234a7dcc6abc973bb46f7bf88e44fb601b00b979dc7ce038ee7" network for pod "cityfarm-development-7997fb8976-gtgxp": networkPlugin cni failed to set up pod "cityfarm-development-7997fb8976-gtgxp_default" network: add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 10s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "7c09e808295a4554e39bf133d8b8a992ad5becd7a29e1c58a1f0151694b74b40" network for pod "cityfarm-development-7997fb8976-gtgxp": networkPlugin cni failed to set up pod "cityfarm-development-7997fb8976-gtgxp_default" network: add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 9s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "eb6dfe5fc8aed5d494c47d6af38af39e1e0ffff0cc623d4284562a56b627cd07" network for pod "cityfarm-development-7997fb8976-gtgxp": networkPlugin cni failed to set up pod "cityfarm-development-7997fb8976-gtgxp_default" network: add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 8s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "8921569cf713a0a9486bd7ad16404379f399f2a90cd0c9ffc27975b5b4888fbc" network for pod "cityfarm-development-7997fb8976-gtgxp": networkPlugin cni failed to set up pod "cityfarm-development-7997fb8976-gtgxp_default" network: add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 7s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "77a1cd4815ecfa65187649a97c4113de8a88403bc69bba2a8b29c10f2f22d6d7" network for pod "cityfarm-development-7997fb8976-gtgxp": networkPlugin cni failed to set up pod "cityfarm-development-7997fb8976-gtgxp_default" network: add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 6s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "e4a7119500377a0339425e7e3c9fd48d33f03fa5d09bb6614d57d51cec8213d7" network for pod "cityfarm-development-7997fb8976-gtgxp": networkPlugin cni failed to set up pod "cityfarm-development-7997fb8976-gtgxp_default" network: add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 5s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "115ce7ea2070f0efa4a0823b07c9736eeb03a83ca013c6153705f98918c0e89c" network for pod "cityfarm-development-7997fb8976-gtgxp": networkPlugin cni failed to set up pod "cityfarm-development-7997fb8976-gtgxp_default" network: add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 4s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "aa296b49488058e6a9857cf991557fbf9b9555fc8bf28aa933a64237f1c425f6" network for pod "cityfarm-development-7997fb8976-gtgxp": networkPlugin cni failed to set up pod "cityfarm-development-7997fb8976-gtgxp_default" network: add cmd: failed to assign an IP address to container
Warning FailedCreatePodSandBox 3s kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "56a700c514d4b769fa64f7af3daa671dbd4ff06dfbc5bd76e76505e86f50d623" network for pod "cityfarm-development-7997fb8976-gtgxp": networkPlugin cni failed to set up pod "cityfarm-development-7997fb8976-gtgxp_default" network: add cmd: failed to assign an IP address to container
Normal SandboxChanged 2s (x10 over 11s) kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulling 2s kubelet Pulling image "ID.dkr.ecr.REGION.amazonaws.com/cityfarm-backend:main-cecff2b3-1636616305"
Hi @GoGoPenguin, can you resend logs to k8s-awscni-triage@amazon.com
instead. Thanks!
Hi @GoGoPenguin, can you resend logs to
k8s-awscni-triage@amazon.com
instead. Thanks!
@Shreya027 Done. Thank you.
Thanks, Looking into it, will get back soon.
So, in the new logs, I see a similar pattern as mentioned by me above, however with the following error messages: Unable to get IP address from CIDR: no free IP available in the prefix
and assignPodIPv4AddressUnsafe: no available IP/Prefix addresses
Hi @GoGoPenguin , I see the pod is assigned successfully later, after the failed to assign IP address events:
After the IP address assignment fails for container 56a700c514d4b769fa64f7af3daaxxxxx
for pod cityfarm-development-7997fb8976-gtgxp
, the same pod gets IP successfully with container ID 79d32e3aae71f0xxxxx
as seen below:
{"level":"info","ts":"2021-12-13T02:48:48.916Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"Received CNI add request: ContainerID(79d32e3aae71f0xxxxx) Netns(/proc/6756/ns/net) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=default;K8S_POD_NAME=cityfarm-development-7997fb8976-gtgxp;K8S_POD_INFRA_CONTAINER_ID=79d32e3aae71f0xxxxx) Path(/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.3.1\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"DEBUG\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"debug","ts":"2021-12-13T02:48:48.916Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"MTU value set is 9001:"}
{"level":"info","ts":"2021-12-13T02:48:48.925Z","caller":"routed-eni-cni-plugin/cni.go:111","msg":"Received add network response for container 79d32e3aae71f0xxxxx interface eth0: Success:true IPv4Addr:\"192.168.74.87\" DeviceNumber:1 VPCv4CIDRs:\"192.168.0.0/16\""}
Were you able to see the pod deployment running successfully later?
If you were wondering why the initial errors were seen before the successful deployment, I have stated the reason below:
The previous failures were because of 13 of 15 total IPs being used and 2 IPs being present in cool down throughout the failed to assign IP address period. When one of the IP from cool down is return to warm pool, it gets assigned to the pod mentioned above as seen below in ipamd.json logs. As instance type is t3.medium as @jayanthvn mentioned, it supports 3 ENIs and 5 secondary IPs so only 15 IPs will be available at any time.
{"level":"debug","ts":"2021-12-13T02:48:48.920Z","caller":"datastore/data_store.go:757","msg":"Returning Free IP 192.168.74.87"}
{"level":"debug","ts":"2021-12-13T02:48:48.920Z","caller":"datastore/data_store.go:680","msg":"New IP from CIDR pool- 192.168.74.87"}
{"level":"info","ts":"2021-12-13T02:48:48.920Z","caller":"datastore/data_store.go:784","msg":"AssignPodIPv4Address: Assign IP 192.168.74.87 to sandbox aws-cni/79d32e3aae71f0xxxxx"}
Note: I have replaced contained IDs with xxxxx at end. You could use the timestamps to find mappings in your log.
@Shreya027 Sometimes the pod gets stuck in the ContainerCreating state and cannot get the ip from CNI.
Hi @GoGoPenguin , could you please send me logs for this case you are referring to? It will be helpful for me to debug further. The logs you sent earlier had the IP assigned to pod eventually, so I didn't find any issues there.
I am seeing similar behaviors for my EKS cluster.
I looked through the above outputs and the error messages are nearly identical.
Im wondering if upgrading the version from 3.19.1
-> 3.21.x
would help with the network latency and prevent pods from getting stuck in the failure state due to this networking issue.
Hi @jwitrick, would it be possible for you to send your error logs to k8s-awscni-triage@amazon.com
? The logs sent earlier in the issue had the pod IP assigned eventually.
We are experiencing a similar issue and getting these same ipamd
messages with host/pods losing networking for periods of time, even though the subnet has plenty of free IP addresses. I've sent an email to k8s-awscni-triage@amazon.com with additional details, case IDs, and logs from an affect EKS node.
The problem is that my pods can't find the RDS endpoints because they can't resolve the endpoint... I can see the same error about the sandbox as follows:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "d423cb5bb261338d384bf2266fbadc05bc074b432319df49b6011c7f954364f3" network for pod "x-y-service-aws-sae1-prdt-ppd-dev-789b656b462tbt4": networkPlugin cni failed to set up pod "x-y-service-aws-sae1-prdt-ppd-dev-789b656b462tbt4_x-aws-sae1-prdt-ppd-dev" network: add cmd: failed to assign an IP address to container
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 56m default-scheduler Successfully assigned x-aws-sae1-prdt-ppd-dev/x-y-service-aws-sae1-prdt-ppd-dev-789b656b462tbt4 to ip-172-16-3-127.sa-east-1.compute.internal
Normal SecurityGroupRequested 56m vpc-resource-controller Pod will get the following Security Groups [sg-05c167fb067217b5e]
Warning FailedCreatePodSandBox 56m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "d423cb5bb261338d384bf2266fbadc05bc074b432319df49b6011c7f954364f3" network for pod "x-y-service-aws-sae1-prdt-ppd-dev-789b656b462tbt4": networkPlugin cni failed to set up pod "x-y-service-aws-sae1-prdt-ppd-dev-789b656b462tbt4_x-aws-sae1-prdt-ppd-dev" network: add cmd: failed to assign an IP address to container
Normal ResourceAllocated 56m vpc-resource-controller Allocated [{"eniId":"eni-061b36f5ef1acac44","ifAddress":"0a:d1:a0:62:57:18","privateIp":"172.16.3.254","vlanId":1,"subnetCidr":"172.16.3.0/24"}] to the pod
Normal SandboxChanged 56m kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 56m kubelet Successfully pulled image "registry.gitlab.com/x/y/x-service:354452df-develop" in 5.934951686s
Normal Pulled 55m kubelet Successfully pulled image "registry.gitlab.com/x/y/x-service:354452df-develop" in 1.260281578s
Normal Pulled 55m kubelet Successfully pulled image "registry.gitlab.com/x/y/x-service:354452df-develop" in 1.169348684s
Normal Pulled 54m kubelet Successfully pulled image "registry.gitlab.com/x/y/x-service:354452df-develop" in 1.138185066s
Normal Created 54m (x4 over 56m) kubelet Created container x-service
Normal Started 54m (x4 over 56m) kubelet Started container x-service
Normal Pulling 26m (x11 over 56m) kubelet Pulling image "registry.gitlab.com/x/y/x-service:354452df-develop"
Warning BackOff 69s (x231 over 55m) kubelet Back-off restarting failed container
Creating
"Creating"
... $ aws eks describe-addon \
--cluster-name eks-ppd-prdt-x-y \
--addon-name vpc-cni
{
"addon": {
"addonName": "vpc-cni",
"clusterName": "eks-ppd-prdt-x-y",
"status": "CREATING",
"addonVersion": "v1.10.1-eksbuild.1",
"health": {
"issues": []
},
"addonArn": "arn:aws:eks:sa-east-1:xxx:addon/eks-ppd-prdt-x-y/vpc-cni/f8bf5219-13e8-3d54-76da-0ef9415aad0e",
"createdAt": "2022-01-28T20:57:37.678000-08:00",
"modifiedAt": "2022-01-28T20:57:37.698000-08:00",
"serviceAccountRoleArn": "arn:aws:iam::xxx:role/AmazonEKSCNIRole",
"tags": {}
}
}
AmazonEKSCNIRole
$ aws iam list-roles | jq -r '.Roles[] | select(.RoleName == "AmazonEKSCNIRole")'
$ kubectl get pods -n kube-system -l k8s-app=aws-node
NAME READY STATUS RESTARTS AGE
aws-node-6gdnx 0/1 CrashLoopBackOff 319 22h
aws-node-6rj5d 0/1 Running 320 22h
aws-node-c9cxv 0/1 Running 320 22h
aws-node-cst8j 0/1 Running 321 22h
aws-node-j7gbn 0/1 CrashLoopBackOff 320 22h
aws-node-jtjmm 0/1 CrashLoopBackOff 320 22h
aws-node-k6bvl 0/1 Running 321 22h
AmazonEKS_CNI_Policy
role missing and I manually attached$ aws iam attach-role-policy --policy-arn arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy --role-name eks-ppd-prdt-x-y20220128224446671500000001
βοΈ aws-cli@2.2.32 π aws-iam-authenticator@0.5.3
βΈοΈ kubectl@1.22.4 π kustomize@v4.3.0 π‘ helm@3.7.0 π½ argocd@2.2.0 βοΈ glooctl@1.9.0
π€ axs-marcello-root ποΈ π sa-east-1
π π arn:aws:eks:sa-east-1:xxxxxx:cluster/eks-ppd-prdt-x-y π± default
~/dev/gitlab.com/x/services-deploy/xy-service-deploy on ξ master! π
01-29-2022 β20:09:04
$ aws iam list-attached-role-policies --role-name eks-ppd-prdt-x-y20220128224446671500000001
{
"AttachedPolicies": [
{
"PolicyName": "eks-ppd-prdt-x-y-elb-sl-role-creation20220128224446672600000002",
"PolicyArn": "arn:aws:iam::xxxxyyyzzz:policy/eks-ppd-prdt-x-yelb-sl-role-creation20220128224446672600000002"
},
{
"PolicyName": "AmazonEKSClusterPolicy",
"PolicyArn": "arn:aws:iam::aws:policy/AmazonEKSClusterPolicy"
},
{
"PolicyName": "AmazonEKSServicePolicy",
"PolicyArn": "arn:aws:iam::aws:policy/AmazonEKSServicePolicy"
},
{
"PolicyName": "AmazonEKS_CNI_Policy",
"PolicyArn": "arn:aws:iam::aws:policy/AmazonEKS_CNI_Policy"
},
{
"PolicyName": "AmazonEKSVPCResourceController",
"PolicyArn": "arn:aws:iam::aws:policy/AmazonEKSVPCResourceController"
}
]
}
InsufficientNumberOfReplicas | The add-on is unhealthy because it doesn't have the desired number of replicas.
Service account role: Info Inherited from node
$ kubectl get pods -n kube-system -l k8s-app=aws-node
NAME READY STATUS RESTARTS AGE
aws-node-769nf 1/1 Running 0 30s
aws-node-77r4w 1/1 Running 0 31s
aws-node-jp8tg 1/1 Running 0 27s
aws-node-pvsx9 1/1 Running 0 26s
aws-node-s24xp 1/1 Running 0 27s
aws-node-s54jd 1/1 Running 0 35s
aws-node-xnxvx 1/1 Running 0 29s
aws eks create-addon \
--cluster-name ${EKS_CLUSTER_NAME} \
--addon-name vpc-cni \
--addon-version ${CNI_COMPATIBLE_VERSION} \
--resolve-conflicts OVERWRITE
--service-account-role-arn ${ROLE_ARN} \
$ kubectl get pods -n kube-system -l k8s-app=aws-node
NAME READY STATUS RESTARTS AGE
aws-node-4ntn7 0/1 Running 0 88s
aws-node-4rtwk 0/1 Running 0 89s
aws-node-7fcsw 0/1 Running 0 86s
aws-node-7p8cv 0/1 Running 0 86s
aws-node-mb6bf 0/1 Running 0 81s
aws-node-nndk8 0/1 Running 0 84s
aws-node-strhl 0/1 Running 0 85s
...
...
$ kubectl get pods -n kube-system -l k8s-app=aws-node
NAME READY STATUS RESTARTS AGE
aws-node-4ntn7 0/1 Running 2 3m58s
aws-node-4rtwk 0/1 Running 2 3m59s
aws-node-7fcsw 0/1 Running 2 3m56s
aws-node-7p8cv 0/1 Running 2 3m56s
aws-node-mb6bf 0/1 Running 2 3m51s
aws-node-nndk8 0/1 Running 2 3m54s
aws-node-strhl 0/1 Running 2 3m55s
$ kubectl logs -n kube-system aws-node-strhl
{"level":"info","ts":"2022-01-30T04:36:56.122Z","caller":"entrypoint.sh","msg":"Validating env variables ..."}
{"level":"info","ts":"2022-01-30T04:36:56.124Z","caller":"entrypoint.sh","msg":"Install CNI binaries.."}
{"level":"info","ts":"2022-01-30T04:36:56.141Z","caller":"entrypoint.sh","msg":"Starting IPAM daemon in the background ... "}
{"level":"info","ts":"2022-01-30T04:36:56.145Z","caller":"entrypoint.sh","msg":"Checking for IPAM connectivity ... "}
I0130 04:36:57.229531 12 request.go:621] Throttling request took 1.040271681s, request: GET:https://10.100.0.1:443/apis/argoproj.io/v1alpha1?timeout=32s
{"level":"info","ts":"2022-01-30T04:36:58.154Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-01-30T04:37:00.161Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-01-30T04:37:02.168Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-01-30T04:37:04.174Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
{"level":"info","ts":"2022-01-30T04:37:06.181Z","caller":"entrypoint.sh","msg":"Retrying waiting for IPAM-D"}
aws eks create-addon \
--cluster-name ${EKS_CLUSTER_NAME} \
--addon-name vpc-cni \
--addon-version ${CNI_COMPATIBLE_VERSION} \
--resolve-conflicts OVERWRITE
#--service-account-role-arn ${ROLE_ARN} \
$ kubectl get pods -n kube-system -l k8s-app=aws-node
NAME READY STATUS RESTARTS AGE
aws-node-27p49 1/1 Running 0 23s
aws-node-k72xr 1/1 Running 0 16s
aws-node-m28lc 1/1 Running 0 15s
aws-node-smg5p 1/1 Running 0 22s
aws-node-whhtb 1/1 Running 0 11s
aws-node-xsc5f 1/1 Running 0 19s
aws-node-z9qz7 1/1 Running 0 17s
AmazonEKS_CNI_Policy
and it worked
$ kubectl describe pod -n zzz-aws-x-y-z-dev green-pod-5db68f6449-n6m8x
Name: green-pod-5db68f6449-n6m8x
Namespace: zzz-aws-x-y-z-dev
Priority: 0
Node: ip-172-16-1-214.sa-east-1.compute.internal/172.16.1.214
Start Time: Sun, 30 Jan 2022 01:17:32 -0800
Labels: app=green-pod
pod-template-hash=5db68f6449
Annotations: kubernetes.io/psp: eks.privileged
vpc.amazonaws.com/pod-eni:
[{"eniId":"eni-0243826591076371e","ifAddress":"02:4a:df:8f:23:84","privateIp":"172.16.1.224","vlanId":2,"subnetCidr":"172.16.1.0/24"}]
Status: Running
IP: 172.16.1.89
IPs:
IP: 172.16.1.89
Controlled By: ReplicaSet/green-pod-5db68f6449
Containers:
green-pod:
Container ID: docker://bf1dd9d5fbad507bd0d503baa2cc2f65d2d96652b4c69da7a4d48bc2b8ff84c7
Image: fmedery/app:latest
Image ID: docker-pullable://fmedery/app@sha256:64fadcdfe9f826b842a8c576ae4b9dbc4e18a9865226e556baad71bfea239292
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 30 Jan 2022 01:17:54 -0800
Finished: Sun, 30 Jan 2022 01:17:54 -0800
Last State: Terminated
Reason: Completed
Exit Code: 0
Started: Sun, 30 Jan 2022 01:17:37 -0800
Finished: Sun, 30 Jan 2022 01:17:37 -0800
Ready: False
Restart Count: 2
Limits:
cpu: 512m
memory: 512Mi
vpc.amazonaws.com/pod-eni: 1
Requests:
cpu: 500m
memory: 256Mi
vpc.amazonaws.com/pod-eni: 1
Environment:
HOST: <set to the key 'host' in secret 'rds-postgres'> Optional: false
DBNAME: dbnameeee
USER: <set to the key 'username' in secret 'rds-postgres'> Optional: false
PASSWORD: <set to the key 'password' in secret 'rds-postgres'> Optional: false
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-mp4k8 (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
kube-api-access-mp4k8:
Type: Projected (a volume that contains injected data from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
vpc.amazonaws.com/pod-eni:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 34s default-scheduler Successfully assigned zzz-zzz-sae1-prdt-ppd-dev/green-pod-5db68f6449-n6m8x to ip-172-16-1-214.sa-east-1.compute.internal
Normal SecurityGroupRequested 34s vpc-resource-controller Pod will get the following Security Groups [sg-05c167fb067217b5e]
Normal ResourceAllocated 33s vpc-resource-controller Allocated [{"eniId":"eni-0243826591076371e","ifAddress":"02:4a:df:8f:23:84","privateIp":"172.16.1.224","vlanId":2,"subnetCidr":"172.16.1.0/24"}] to the pod
Normal Pulled 31s kubelet Successfully pulled image "fmedery/app:latest" in 1.485599101s
Normal Pulled 29s kubelet Successfully pulled image "fmedery/app:latest" in 1.462975983s
Normal Pulling 13s (x3 over 33s) kubelet Pulling image "fmedery/app:latest"
Normal Created 12s (x3 over 31s) kubelet Created container green-pod
Normal Started 12s (x3 over 31s) kubelet Started container green-pod
Warning BackOff 12s (x3 over 29s) kubelet Back-off restarting failed container
Normal Pulled 12s kubelet Successfully pulled image "fmedery/app:latest" in 1.439726978s
eni-0243826591076371e
$ aws ec2 describe-network-interfaces | jq -r '.NetworkInterfaces[] | select(.NetworkInterfaceId == "eni-0243826591076371e")'
{
"AvailabilityZone": "sa-east-1a",
"Description": "aws-k8s-branch-eni",
"Groups": [
{
"GroupName": "conn-4-pod-rds-group",
"GroupId": "sg-05c167fb067217b5e"
}
],
"InterfaceType": "branch",
"Ipv6Addresses": [],
"MacAddress": "02:4a:df:8f:23:84",
"NetworkInterfaceId": "eni-0243826591076371e",
"OwnerId": "806101772216",
"PrivateDnsName": "ip-172-16-1-224.sa-east-1.compute.internal",
"PrivateIpAddress": "172.16.1.224",
"PrivateIpAddresses": [
{
"Primary": true,
"PrivateDnsName": "ip-172-16-1-224.sa-east-1.compute.internal",
"PrivateIpAddress": "172.16.1.224"
}
],
"RequesterId": "285275063451",
"RequesterManaged": false,
"SourceDestCheck": true,
"Status": "in-use",
"SubnetId": "subnet-0a130d65efd4f0071",
"TagSet": [
{
"Key": "eks:eni:owner",
"Value": "eks-vpc-resource-controller"
},
{
"Key": "vpcresources.k8s.aws/trunk-eni-id",
"Value": "eni-017f74c86c392f663"
},
{
"Key": "kubernetes.io/cluster/eks-ppd-prdt-super-cash",
"Value": "owned"
},
{
"Key": "vpcresources.k8s.aws/vlan-id",
"Value": "2"
}
],
"VpcId": "vpc-04858bd8c565075ae"
}
@GoGoPenguin Were you finally able to get a fix?
@justin-obn are you facing a similar issue ? what version of vpc-cni are you using. Could you share your logs at k8s-awscni-triage@amazon.com
@cgchinmay Thanks for you quick reply. I'm not facing this issue anymore
@GoGoPenguin - This is expected behavior on your cluster and we see the max IPs are reached and once freed we are seeing new pods are getting IPs. If the issue persists please feel free to open an issue.
Comments on closed issues are hard for our team to see. If you need more assistance, please open a new issue that references this one. If you wish to keep having a conversation with other community members under this issue feel free to do so.
Having similar issue.
I deployed an EKS cluster with Kubernetes 1.29 and I had to update the kube-proxy
DaemonSet to the latest version to make it work.
What happened:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "efafab567a68ffb237e67050c4d70d2d1084bf1aca3631b74e5a0802146d150a" network for pod "xxx-9d8b7c98d-j8ldn": networkPlugin cni failed to set up pod "xxx-9d8b7c98d-j8ldn_default" network: add cmd: failed to assign an IP address to container
eks_i-089d9482970086cc5_2021-12-11_1011-UTC_0.6.2.tar.gz
Environment:
Kubernetes version (use
kubectl version
):CNI Version
OS (e.g:
cat /etc/os-release
):Kernel (e.g.
uname -a
):