After upgrading the CNI version from v1.5.1-rc1 to v1.5.4, we are seeing issue where pod was unable to communicate with other pod on the same worker node. We have the following schema
CoreDNS pod on eth0
Kibana pod on eth0
App1 on eth1
App2 on eth2
What we are seeing is that DNS query from App1 and App2 failed with no server found when we tried it using dig command
dig @CoreDNS-ip amazonaws.com
Meanwhile, executing the same command from Kibana pod, the worker node and pod on a different worker node works as expected.
When collecting the logs using https://github.com/nithu0115/eks-logs-collector, we found out that CoreDNS IP was not found anywhere on the output of the ip rule show command. I would expect for each IP address of a pod running on the worker node it should have at least this associated rule on the ip rule
512: from all to POD_IP lookup main
However, we do not see one for the CoreDNS pod IP. Therefore, we believe that this is an issue with the CNI plugin unable to rebuild the rule after upgrade. There is an internal issue open for this if you want to get the collected logs
After upgrading the CNI version from v1.5.1-rc1 to v1.5.4, we are seeing issue where pod was unable to communicate with other pod on the same worker node. We have the following schema
CoreDNS pod on eth0 Kibana pod on eth0 App1 on eth1 App2 on eth2
What we are seeing is that DNS query from App1 and App2 failed with no server found when we tried it using dig command
dig @CoreDNS-ip amazonaws.com
Meanwhile, executing the same command from Kibana pod, the worker node and pod on a different worker node works as expected.
When collecting the logs using https://github.com/nithu0115/eks-logs-collector, we found out that CoreDNS IP was not found anywhere on the output of the ip rule show command. I would expect for each IP address of a pod running on the worker node it should have at least this associated rule on the ip rule
512: from all to POD_IP lookup main
However, we do not see one for the CoreDNS pod IP. Therefore, we believe that this is an issue with the CNI plugin unable to rebuild the rule after upgrade. There is an internal issue open for this if you want to get the collected logs