Closed dims closed 1 month ago
Hmm , network unreachable :thinking:
checking this cloud init https://storage.googleapis.com/kubernetes-jenkins/logs/ci-kubernetes-e2e-ec2-eks-al2023/1814933511295995904/artifacts/logs/i-008394e69e833839a/cloud-init.log
2024-07-21 08:11:14,265 - url_helper.py[DEBUG]: Read from http://169.254.169.254/2021-03-23/dynamic/instance-identity/signature (200, 174b) after 1 attempts
2024-07-21 08:11:14,265 - util.py[DEBUG]: Crawl of metadata service took 0.180 seconds
2024-07-21 08:11:14,265 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'del', 'default', 'dev', 'ens5'] with allowed return codes [0] (shell=False, capture=True)
2024-07-21 08:11:14,267 - subp.py[DEBUG]: Running command ['ip', '-4', 'route', 'del', '172.31.80.1', 'dev', 'ens5', 'src', '172.31.89.46'] with allowed return codes [0] (shell=False, capture=True)
why are the default routes deleted ?
no idea! will watch out for this again
for reference, we discussed in slack, the failing jobs use Nodes in different subnets, when kindnet tries to add the route of the pod subnet through the node IP it fails, because the IP is not reachable (it has to be in the same subnet)
If we use the default gateway , then the VPC must have knowledge of the nodes and pods subnets and route the traffic to the corresponding node, I don't know if this is possible but I don't recommend this setup as it complicates the network and leaks details of the cluster to the VPC.
Another option is to create an overlay between nodes, but then you have a more complex setup harder to troubleshoot and with considerable worse performance.
My recommendation is for kubetest2 to always deploy the nodes in the same VPC subnet
/close
Done. thanks!
My recommendation is for kubetest2 to always deploy the nodes in the same VPC subnet
thanks @aojea i agree.
https://prow.k8s.io/view/gs/kubernetes-jenkins/logs/ci-kubernetes-e2e-ec2-eks-al2023/1814933511295995904
See kindnet-cni logs from