Upgrade CNI version broke pod-to-pod communication within the same worker node

After upgrading the CNI version from v1.5.1-rc1 to v1.5.4, we are seeing issue where pod was unable to communicate with other pod on the same worker node. We have the following schema

CoreDNS pod on eth0 Kibana pod on eth0 App1 on eth1 App2 on eth2

What we are seeing is that DNS query from App1 and App2 failed with no server found when we tried it using dig command

dig @CoreDNS-ip amazonaws.com

Meanwhile, executing the same command from Kibana pod, the worker node and pod on a different worker node works as expected.

When collecting the logs using https://github.com/nithu0115/eks-logs-collector, we found out that CoreDNS IP was not found anywhere on the output of the ip rule show command. I would expect for each IP address of a pod running on the worker node it should have at least this associated rule on the ip rule

512: from all to POD_IP lookup main

However, we do not see one for the CoreDNS pod IP. Therefore, we believe that this is an issue with the CNI plugin unable to rebuild the rule after upgrade. There is an internal issue open for this if you want to get the collected logs

aws / amazon-vpc-cni-k8s

Upgrade CNI version broke pod-to-pod communication within the same worker node #641