aws / amazon-vpc-cni-k8s

Networking plugin repository for pod networking in Kubernetes using Elastic Network Interfaces on AWS
Apache License 2.0
2.28k stars 743 forks source link

Non fatal but persistent warning: "Failed to create pod sandbox ... failed to assign an IP address to container." #3066

Open taer opened 1 month ago

taer commented 1 month ago

Every pod that we startup gets this one Warning level event one time.

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "containerString": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container

Here are all the events for that pod.

6m2s                       Normal    Scheduled                Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Successfully assigned content-api/engine-content-cars-etl-cron-short-28809865-2dhw7 to ip-172-21-171-235.ec2.internal
6m2s                       Normal    SecurityGroupRequested   Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Pod will get the following Security Groups [sg-040fe161363814f9b sg-04df5482042d9290e]
6m2s                       Warning   FailedCreatePodSandBox   Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8fd6567d166d83aa85a4ee6dc055876106e2abb8b4678180607ea5800dec5001": plugin type="aws-cni" name="aws-cni" failed (add): add cmd: failed to assign an IP address to container
6m2s                       Normal    SuccessfulCreate         Job/engine-content-cars-etl-cron-short-28809865         Created pod: engine-content-cars-etl-cron-short-28809865-2dhw7
6m1s                       Normal    ResourceAllocated        Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Allocated [{"eniId":"eni-0f225e0d5531c7cce","ifAddress":"0e:f1:ce:61:9c:d7","privateIp":"172.21.189.168","ipv6Addr":"","vlanId":2,"subnetCidr":"172.21.160.0/19","subnetV6Cidr":""}] to the pod
6m1s                       Normal    Pulled                   Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Container image "docker.io/istio/proxyv2:1.22.3" already present on machine
6m1s                       Normal    Created                  Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Created container istio-validation
6m1s                       Normal    Started                  Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Started container istio-validation
6m                         Normal    Created                  Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Created container istio-proxy
6m                         Normal    Started                  Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Started container istio-proxy
6m                         Normal    Pulled                   Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Container image "docker.io/istio/proxyv2:1.22.3" already present on machine
5m58s                      Normal    Pulled                   Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Container image "534287151633.dkr.ecr.us-east-1.amazonaws.com/engine-content-etl:0.2.4" already present on machine
5m58s                      Normal    Created                  Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Created container engine-content-cars-etl-cron-short
5m58s                      Normal    Started                  Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Started container engine-content-cars-etl-cron-short
5m41s                      Normal    Killing                  Pod/engine-content-cars-etl-cron-short-28809865-2dhw7   Stopping container istio-proxy

To date, this has seems just like noise. But it happens whether this is a pod on a new host or existing.

Config params for aws-cni are

{
        init = {
          env = {
            DISABLE_TCP_EARLY_DEMUX = "true"
          }
        }
        env = {
          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION          = "true"
          WARM_PREFIX_TARGET                = "1"
          ENABLE_POD_ENI                    = "true",
          POD_SECURITY_GROUP_ENFORCING_MODE = "standard",
          AWS_VPC_K8S_CNI_EXTERNALSNAT      = "true"
        }
      }

This is on a 1.30 EKS cluster. CNI image is: 602401143452.dkr.ecr.us-east-1.amazonaws.com/amazon-k8s-cni:v1.18.3-eksbuild.3

Again, this seems non-fatal, but I'd love to eliminate warnings from my clusters events if possible

Thanks

taer commented 1 month ago

It seems to only be on the pods with security groups assigned via POD_ENI. They do not have this warning when there are no sec groups assigned

taer commented 1 month ago

digging into the cni logs, I get this.

For this warning failed to setup network for sandbox "d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d246, in the logs, I see the following in plugin.log

{"level":"info","ts":"2024-10-10T21:51:34.284Z","caller":"routed-eni-cni-plugin/cni.go:119","msg":"Constructed new logger instance"}
{"level":"info","ts":"2024-10-10T21:51:34.284Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"Received CNI add request: ContainerID(d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644f175222) Netns(/var/run/netns/cni-a1e04c19-447f-a214-5ccf-76f51a955e74) IfName(eth0) Args(K8S_POD_NAMESPACE=booking-cars;K8S_POD_NAME=jobs-79c5bb8c56-l7mfw;K8S_POD_INFRA_CONTAINER_ID=d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644f175222;K8S_POD_UID=a94b7427-7856-4dc4-844c-8a426631e99d;IgnoreUnknown=1) Path(/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.4.0\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"DEBUG\",\"podSGEnforcingMode\":\"standard\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"debug","ts":"2024-10-10T21:51:34.284Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"Prev Result: <nil>\n"}
{"level":"debug","ts":"2024-10-10T21:51:34.284Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"MTU value set is 9001:"}
{"level":"error","ts":"2024-10-10T21:51:34.301Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"Failed to assign an IP address to container d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644f175222"}
{"level":"info","ts":"2024-10-10T21:51:34.337Z","caller":"routed-eni-cni-plugin/cni.go:119","msg":"Constructed new logger instance"}
{"level":"debug","ts":"2024-10-10T21:51:34.337Z","caller":"routed-eni-cni-plugin/cni.go:314","msg":"Prev Result: <nil>\n"}
{"level":"info","ts":"2024-10-10T21:51:34.337Z","caller":"routed-eni-cni-plugin/cni.go:314","msg":"Received CNI del request: ContainerID(d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644f175222) Netns(/var/run/netns/cni-a1e04c19-447f-a214-5ccf-76f51a955e74) IfName(eth0) Args(IgnoreUnknown=1;K8S_POD_NAMESPACE=booking-cars;K8S_POD_NAME=jobs-79c5bb8c56-l7mfw;K8S_POD_INFRA_CONTAINER_ID=d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644f175222;K8S_POD_UID=a94b7427-7856-4dc4-844c-8a426631e99d) Path(/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.4.0\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"DEBUG\",\"podSGEnforcingMode\":\"standard\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"info","ts":"2024-10-10T21:51:34.340Z","caller":"routed-eni-cni-plugin/cni.go:314","msg":"Container d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644f175222 not found"}
{"level":"info","ts":"2024-10-10T21:51:34.993Z","caller":"routed-eni-cni-plugin/cni.go:119","msg":"Constructed new logger instance"}
{"level":"info","ts":"2024-10-10T21:51:34.993Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"Received CNI add request: ContainerID(6f8cbcf57411933b3f53a986a6cf91dbe2f2e6120012c96ee7447bcc0f1c4caa) Netns(/var/run/netns/cni-cfaf09cc-342d-794e-978a-c7afa071a0e2) IfName(eth0) Args(K8S_POD_UID=a94b7427-7856-4dc4-844c-8a426631e99d;IgnoreUnknown=1;K8S_POD_NAMESPACE=booking-cars;K8S_POD_NAME=jobs-79c5bb8c56-l7mfw;K8S_POD_INFRA_CONTAINER_ID=6f8cbcf57411933b3f53a986a6cf91dbe2f2e6120012c96ee7447bcc0f1c4caa) Path(/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.4.0\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"DEBUG\",\"podSGEnforcingMode\":\"standard\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"debug","ts":"2024-10-10T21:51:34.993Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"Prev Result: <nil>\n"}
{"level":"debug","ts":"2024-10-10T21:51:34.993Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"MTU value set is 9001:"}
{"level":"info","ts":"2024-10-10T21:51:35.002Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"Received add network response from ipamd for container 6f8cbcf57411933b3f53a986a6cf91dbe2f2e6120012c96ee7447bcc0f1c4caa interface eth0: Success:true IPv4Addr:\"172.21.149.56\" DeviceNumber:-1 UseExternalSNAT:true VPCv4CIDRs:\"172.21.0.0/16\" PodVlanId:2 PodENIMAC:\"0a:ff:c4:52:0f:4d\" PodENISubnetGW:\"172.21.128.1\" ParentIfIndex:7 NetworkPolicyMode:\"standard\""}
{"level":"debug","ts":"2024-10-10T21:51:35.002Z","caller":"routed-eni-cni-plugin/cni.go:221","msg":"SetupBranchENIPodNetwork: hostVethName=eni83f7dc8eecd, contVethName=eth0, netnsPath=/var/run/netns/cni-cfaf09cc-342d-794e-978a-c7afa071a0e2, v4Addr=172.21.149.56/32, v6Addr=<nil>, vlanID=2, eniMAC=0a:ff:c4:52:0f:4d, subnetGW=172.21.128.1, parentIfIndex=7, mtu=9001, podSGEnforcingMode=standard"}
{"level":"debug","ts":"2024-10-10T21:51:35.027Z","caller":"driver/driver.go:280","msg":"Successfully set IPv6 sysctls on hostVeth eni83f7dc8eecd"}
{"level":"debug","ts":"2024-10-10T21:51:35.031Z","caller":"driver/driver.go:319","msg":"Successfully setup container route, containerAddr=172.21.149.56/32, hostVeth=eni83f7dc8eecd, rtTable=main"}
{"level":"debug","ts":"2024-10-10T21:51:35.031Z","caller":"driver/driver.go:319","msg":"Successfully setup toContainer rule, containerAddr=172.21.149.56/32, rtTable=main"}
{"level":"debug","ts":"2024-10-10T21:51:35.031Z","caller":"driver/driver.go:319","msg":"Successfully setup fromContainer rule, containerAddr=172.21.149.56/32, rtTable=102"}
{"level":"debug","ts":"2024-10-10T21:51:35.031Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"Using dummy interface: {Name:dummy83f7dc8eecd Mac:2 Mtu:0 Sandbox: SocketPath: PciID:}"}

however, that containerID doesn't exist anymore. immediately after, we get this log that looks more successful

{"level":"info","ts":"2024-10-10T21:51:34.993Z","caller":"routed-eni-cni-plugin/cni.go:119","msg":"Constructed new logger instance"}
{"level":"info","ts":"2024-10-10T21:51:34.993Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"Received CNI add request: ContainerID(6f8cbcf57411933b3f53a986a6cf91dbe2f2e6120012c96ee7447bcc0f1c4caa) Netns(/var/run/netns/cni-cfaf09cc-342d-794e-978a-c7afa071a0e2) IfName(eth0) Args(K8S_POD_UID=a94b7427-7856-4dc4-844c-8a426631e99d;IgnoreUnknown=1;K8S_POD_NAMESPACE=booking-cars;K8S_POD_NAME=jobs-79c5bb8c56-l7mfw;K8S_POD_INFRA_CONTAINER_ID=6f8cbcf57411933b3f53a986a6cf91dbe2f2e6120012c96ee7447bcc0f1c4caa) Path(/opt/cni/bin) argsStdinData({\"cniVersion\":\"0.4.0\",\"mtu\":\"9001\",\"name\":\"aws-cni\",\"pluginLogFile\":\"/var/log/aws-routed-eni/plugin.log\",\"pluginLogLevel\":\"DEBUG\",\"podSGEnforcingMode\":\"standard\",\"type\":\"aws-cni\",\"vethPrefix\":\"eni\"})"}
{"level":"debug","ts":"2024-10-10T21:51:34.993Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"Prev Result: <nil>\n"}
{"level":"debug","ts":"2024-10-10T21:51:34.993Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"MTU value set is 9001:"}
{"level":"info","ts":"2024-10-10T21:51:35.002Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"Received add network response from ipamd for container 6f8cbcf57411933b3f53a986a6cf91dbe2f2e6120012c96ee7447bcc0f1c4caa interface eth0: Success:true IPv4Addr:\"172.21.149.56\" DeviceNumber:-1 UseExternalSNAT:true VPCv4CIDRs:\"172.21.0.0/16\" PodVlanId:2 PodENIMAC:\"0a:ff:c4:52:0f:4d\" PodENISubnetGW:\"172.21.128.1\" ParentIfIndex:7 NetworkPolicyMode:\"standard\""}
{"level":"debug","ts":"2024-10-10T21:51:35.002Z","caller":"routed-eni-cni-plugin/cni.go:221","msg":"SetupBranchENIPodNetwork: hostVethName=eni83f7dc8eecd, contVethName=eth0, netnsPath=/var/run/netns/cni-cfaf09cc-342d-794e-978a-c7afa071a0e2, v4Addr=172.21.149.56/32, v6Addr=<nil>, vlanID=2, eniMAC=0a:ff:c4:52:0f:4d, subnetGW=172.21.128.1, parentIfIndex=7, mtu=9001, podSGEnforcingMode=standard"}
{"level":"debug","ts":"2024-10-10T21:51:35.027Z","caller":"driver/driver.go:280","msg":"Successfully set IPv6 sysctls on hostVeth eni83f7dc8eecd"}
{"level":"debug","ts":"2024-10-10T21:51:35.031Z","caller":"driver/driver.go:319","msg":"Successfully setup container route, containerAddr=172.21.149.56/32, hostVeth=eni83f7dc8eecd, rtTable=main"}
{"level":"debug","ts":"2024-10-10T21:51:35.031Z","caller":"driver/driver.go:319","msg":"Successfully setup toContainer rule, containerAddr=172.21.149.56/32, rtTable=main"}
{"level":"debug","ts":"2024-10-10T21:51:35.031Z","caller":"driver/driver.go:319","msg":"Successfully setup fromContainer rule, containerAddr=172.21.149.56/32, rtTable=102"}
{"level":"debug","ts":"2024-10-10T21:51:35.031Z","caller":"routed-eni-cni-plugin/cni.go:128","msg":"Using dummy interface: {Name:dummy83f7dc8eecd Mac:2 Mtu:0 Sandbox: SocketPath: PciID:}"}

That is the IP that was assigned the pod. and all it's metadata. However, that containerID referenced in the logs doesn;t exist. pruned a few fields out to clean this output up

  containerStatuses:
  - containerID: containerd://ce473166e144f8bd2b4194126615184b0daa7f2458c2155171ce19950159aedc
    image: 534287151633.dkr.ecr.us-east-1.amazonaws.com/engine-booking-cars-jobs:0.0.233
    name: engine-cars-jobs
    ready: true

  initContainerStatuses:
  - containerID: containerd://38518dd367bf27fb92a68ce18e8e286bbe2b5d724afd0643b20aee649eddc8f3
    image: docker.io/istio/proxyv2:1.22.3
    lastState: {}
    name: istio-validation
    ready: true
    restartCount: 0
    started: false
    state:
      terminated:
        containerID: containerd://38518dd367bf27fb92a68ce18e8e286bbe2b5d724afd0643b20aee649eddc8f3
        exitCode: 0
        finishedAt: "2024-10-10T21:51:35Z"
        reason: Completed
        startedAt: "2024-10-10T21:51:35Z"
  - containerID: containerd://7902dd019a1e676f906bfade7ba62acc3757f8b91ee839c45e0a10fdb0ad517e
    image: docker.io/istio/proxyv2:1.22.3
    name: istio-proxy

in ipamd.log

{"level":"info","ts":"2024-10-10T21:51:34.288Z","caller":"rpc/rpc.pb.go:863","msg":"Received AddNetwork for NS /var/run/netns/cni-a1e04c19-447f-a214-5ccf-76f51a955e74, Sandbox d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644
f175222, ifname eth0"}
{"level":"debug","ts":"2024-10-10T21:51:34.288Z","caller":"rpc/rpc.pb.go:863","msg":"AddNetworkRequest: K8S_POD_NAME:\"jobs-79c5bb8c56-l7mfw\"  K8S_POD_NAMESPACE:\"booking-cars\"  K8S_POD_INFRA_CONTAINER_ID:\"d78e79e2f3c65976885cf228
da933b8d35471497f1bdfacbff8d24644f175222\"  ContainerID:\"d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644f175222\"  IfName:\"eth0\"  NetworkName:\"aws-cni\"  Netns:\"/var/run/netns/cni-a1e04c19-447f-a214-5ccf-76f51a955e74\"
"}
{"level":"debug","ts":"2024-10-10T21:51:34.289Z","caller":"ipamd/ipamd.go:1953","msg":"Total number of interfaces found: 2 "}
{"level":"debug","ts":"2024-10-10T21:51:34.289Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI MAC address: 0a:ff:f5:68:32:a5"}
{"level":"debug","ts":"2024-10-10T21:51:34.292Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI: eni-0781745ef7092319e, MAC 0a:ff:f5:68:32:a5, device 0"}
{"level":"debug","ts":"2024-10-10T21:51:34.297Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI MAC address: 0a:ff:fa:e1:d4:7b"}
{"level":"debug","ts":"2024-10-10T21:51:34.299Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI: eni-038ebac53d7639b13, MAC 0a:ff:fa:e1:d4:7b, device 2"}
{"level":"debug","ts":"2024-10-10T21:51:34.301Z","caller":"networkutils/network.go:969","msg":"Found the Link that uses mac address 0a:ff:fa:e1:d4:7b and its index is 7 (attempt 1/5)"}
{"level":"info","ts":"2024-10-10T21:51:34.301Z","caller":"rpc/rpc.pb.go:863","msg":"Send AddNetworkReply: failed to get Branch ENI resource"}
{"level":"info","ts":"2024-10-10T21:51:34.339Z","caller":"rpc/rpc.pb.go:881","msg":"Received DelNetwork for Sandbox d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644f175222"}
{"level":"debug","ts":"2024-10-10T21:51:34.339Z","caller":"rpc/rpc.pb.go:881","msg":"DelNetworkRequest: K8S_POD_NAME:\"jobs-79c5bb8c56-l7mfw\"  K8S_POD_NAMESPACE:\"booking-cars\"  K8S_POD_INFRA_CONTAINER_ID:\"d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644f175222\"  Reason:\"PodDeleted\"  ContainerID:\"d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644f175222\"  IfName:\"eth0\"  NetworkName:\"aws-cni\""}{"level":"debug","ts":"2024-10-10T21:51:34.339Z","caller":"ipamd/rpc_handler.go:261","msg":"UnassignPodIPAddress: IP address pool stats: total 32, assigned 5, sandbox aws-cni/d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644f175222/eth0"}{"level":"debug","ts":"2024-10-10T21:51:34.339Z","caller":"ipamd/rpc_handler.go:261","msg":"UnassignPodIPAddress: Failed to find IPAM entry under full key, trying CRI-migrated version"}
{"level":"warn","ts":"2024-10-10T21:51:34.339Z","caller":"ipamd/rpc_handler.go:261","msg":"UnassignPodIPAddress: Failed to find sandbox _migrated-from-cri/d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644f175222/unknown"}
{"level":"info","ts":"2024-10-10T21:51:34.339Z","caller":"rpc/rpc.pb.go:881","msg":"Send DelNetworkReply: IPv4Addr: , IPv6Addr: , DeviceNumber: 0, err: datastore: unknown pod"}{"level":"info","ts":"2024-10-10T21:51:34.995Z","caller":"rpc/rpc.pb.go:863","msg":"Received AddNetwork for NS /var/run/netns/cni-cfaf09cc-342d-794e-978a-c7afa071a0e2, Sandbox 6f8cbcf57411933b3f53a986a6cf91dbe2f2e6120012c96ee7447bcc0f1c4caa, ifname eth0"}
{"level":"debug","ts":"2024-10-10T21:51:34.995Z","caller":"rpc/rpc.pb.go:863","msg":"AddNetworkRequest: K8S_POD_NAME:\"jobs-79c5bb8c56-l7mfw\"  K8S_POD_NAMESPACE:\"booking-cars\"  K8S_POD_INFRA_CONTAINER_ID:\"6f8cbcf57411933b3f53a986a6cf91dbe2f2e6120012c96ee7447bcc0f1c4caa\"  ContainerID:\"6f8cbcf57411933b3f53a986a6cf91dbe2f2e6120012c96ee7447bcc0f1c4caa\"  IfName:\"eth0\"  NetworkName:\"aws-cni\"  Netns:\"/var/run/netns/cni-cfaf09cc-342d-794e-978a-c7afa071a0e2\""}{"level":"debug","ts":"2024-10-10T21:51:34.996Z","caller":"ipamd/ipamd.go:1953","msg":"Total number of interfaces found: 2 "}{"level":"debug","ts":"2024-10-10T21:51:34.996Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI MAC address: 0a:ff:f5:68:32:a5"}
{"level":"debug","ts":"2024-10-10T21:51:34.997Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI: eni-0781745ef7092319e, MAC 0a:ff:f5:68:32:a5, device 0"}
{"level":"debug","ts":"2024-10-10T21:51:34.999Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI MAC address: 0a:ff:fa:e1:d4:7b"}
{"level":"debug","ts":"2024-10-10T21:51:35.000Z","caller":"awsutils/awsutils.go:567","msg":"Found ENI: eni-038ebac53d7639b13, MAC 0a:ff:fa:e1:d4:7b, device 2"}
{"level":"debug","ts":"2024-10-10T21:51:35.001Z","caller":"networkutils/network.go:969","msg":"Found the Link that uses mac address 0a:ff:fa:e1:d4:7b and its index is 7 (attempt 1/5)"}
{"level":"debug","ts":"2024-10-10T21:51:35.001Z","caller":"rpc/rpc.pb.go:863","msg":"Pod vlandId: 2"}
{"level":"debug","ts":"2024-10-10T21:51:35.002Z","caller":"rpc/rpc.pb.go:863","msg":"VPC CIDR 172.21.0.0/16"}
{"level":"info","ts":"2024-10-10T21:51:35.002Z","caller":"rpc/rpc.pb.go:863","msg":"Send AddNetworkReply: IPv4Addr: 172.21.149.56, IPv6Addr: , DeviceNumber: -1, err: <nil>"}

So the difference is this fails

{"level":"debug","ts":"2024-10-10T21:51:34.301Z","caller":"networkutils/network.go:969","msg":"Found the Link that uses mac address 0a:ff:fa:e1:d4:7b and its index is 7 (attempt 1/5)"}
{"level":"info","ts":"2024-10-10T21:51:34.301Z","caller":"rpc/rpc.pb.go:863","msg":"Send AddNetworkReply: failed to get Branch ENI resource"}
{"level":"info","ts":"2024-10-10T21:51:34.339Z","caller":"rpc/rpc.pb.go:881","msg":"Received DelNetwork for Sandbox d78e79e2f3c65976885cf228da933b8d35471497f1bdfacbff8d24644f175222"}

and immediately after, this similar set of logs works

{"level":"debug","ts":"2024-10-10T21:51:35.001Z","caller":"networkutils/network.go:969","msg":"Found the Link that uses mac address 0a:ff:fa:e1:d4:7b and its index is 7 (attempt 1/5)"}
{"level":"debug","ts":"2024-10-10T21:51:35.001Z","caller":"rpc/rpc.pb.go:863","msg":"Pod vlandId: 2"}
{"level":"debug","ts":"2024-10-10T21:51:35.002Z","caller":"rpc/rpc.pb.go:863","msg":"VPC CIDR 172.21.0.0/16"}
{"level":"info","ts":"2024-10-10T21:51:35.002Z","caller":"rpc/rpc.pb.go:863","msg":"Send AddNetworkReply: IPv4Addr: 172.21.149.56, IPv6Addr: , DeviceNumber: -1, err: <nil>"}

So what is causing this Send AddNetworkReply: failed to get Branch ENI resource and how can I prevent it?

Thanks!

taer commented 1 month ago

the code shows

                val, branch := pod.Annotations["vpc.amazonaws.com/pod-eni"]
                if branch {

with the else being the error I see. That pod currently has that annotation.

    vpc.amazonaws.com/pod-eni: '[{"eniId":"eni-0f79fa0088da8b996","ifAddress":"0a:ff:c4:52:0f:4d","privateIp":"172.21.149.56","ipv6Addr":"","vlanId":2,"subnetCidr":"172.21.128.0/19","subnetV6Cidr":""}]'

So this seems like a race where something is responsible for adding json to the pod prior to security groups happening and it's not fast enough.