Closed yoheiueda closed 1 month ago
@bpradipt @EmmEff is it possible to collect some diagnostic data on EKS?
Create a regular (runc) pod and execute the following commands in the pod with kubectl exec
. If you can access the worker node that the pod is running, please execute the same commands.
ip address show
ip link show
ip rule show
ip route show table main
ip neigh show
According to the documentation, the EKS CNI plugin explicitly sets a static ARP entry. If so, I think we can fix the issue by setting the same ARP entry in a network namespace in a peer pod VM.
@yoheiueda please find the requested details
O/p from regular runc pod
[root@priv-pod /]# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
3: eth0@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether 56:37:6d:d9:1b:6a brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.0.0.149/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::5437:6dff:fed9:1b6a/64 scope link
valid_lft forever preferred_lft forever
[root@priv-pod /]#
[root@priv-pod /]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
link/ether 56:37:6d:d9:1b:6a brd ff:ff:ff:ff:ff:ff link-netnsid 0
[root@priv-pod /]#
[root@priv-pod /]# ip rule show
0: from all lookup local
32766: from all lookup main
32767: from all lookup default
[root@priv-pod /]#
[root@priv-pod /]# ip route show table main
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link
[root@priv-pod /]#
[root@priv-pod /]# ip neigh show
169.254.1.1 dev eth0 lladdr fa:11:de:9d:70:4c PERMANENT
O/p from the worker node
root@i-069e28cbaee4769cf:/# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 0a:d1:9a:2b:a4:2f brd ff:ff:ff:ff:ff:ff
altname enp0s5
inet 10.0.0.155/27 metric 100 brd 10.0.0.159 scope global dynamic ens5
valid_lft 3026sec preferred_lft 3026sec
inet6 fe80::8d1:9aff:fe2b:a42f/64 scope link
valid_lft forever preferred_lft forever
3: eni306cbc4b983@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether 66:dd:ca:b1:37:ea brd ff:ff:ff:ff:ff:ff link-netns cni-abf9bb29-2407-a652-8b9f-ee6828c45956
inet6 fe80::64dd:caff:feb1:37ea/64 scope link
valid_lft forever preferred_lft forever
4: enifd8fb8f99f1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether 72:fc:2b:7b:9e:fc brd ff:ff:ff:ff:ff:ff link-netns cni-6b005a7c-9573-9156-ea6b-99423ccdcd6b
inet6 fe80::70fc:2bff:fe7b:9efc/64 scope link
valid_lft forever preferred_lft forever
5: eniecfe8b07af8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether ca:77:ea:d4:2d:7f brd ff:ff:ff:ff:ff:ff link-netns cni-824f05c3-b6c4-7ccb-445d-6895046eaf6b
inet6 fe80::c877:eaff:fed4:2d7f/64 scope link
valid_lft forever preferred_lft forever
6: eni910811243e2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether 3e:55:91:3d:ce:0f brd ff:ff:ff:ff:ff:ff link-netns cni-b7218768-8633-9d14-d634-0a43a6053a65
inet6 fe80::3c55:91ff:fe3d:ce0f/64 scope link
valid_lft forever preferred_lft forever
7: eni81df13ca303@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether 22:e8:18:5c:18:94 brd ff:ff:ff:ff:ff:ff link-netns cni-6c06a3fe-4b87-b321-8c60-c129372a07a1
inet6 fe80::20e8:18ff:fe5c:1894/64 scope link
valid_lft forever preferred_lft forever
8: eni29b100bd66f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether fe:62:bd:77:68:6e brd ff:ff:ff:ff:ff:ff link-netns cni-511e8790-43ec-921a-71fa-1f6d167c3355
inet6 fe80::fc62:bdff:fe77:686e/64 scope link
valid_lft forever preferred_lft forever
15: eni589b674b8a8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether de:86:4b:fa:f6:9d brd ff:ff:ff:ff:ff:ff link-netns cni-2aaf7906-352a-f0e2-341c-d4f2ed6f4ac7
inet6 fe80::dc86:4bff:fefa:f69d/64 scope link
valid_lft forever preferred_lft forever
17: enif6eb1a0053f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether fa:11:de:9d:70:4c brd ff:ff:ff:ff:ff:ff link-netns cni-8492616a-9990-6db9-2b66-233a7a7fd26b
inet6 fe80::f811:deff:fe9d:704c/64 scope link
valid_lft forever preferred_lft forever
root@i-069e28cbaee4769cf:/#
root@i-069e28cbaee4769cf:/# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
link/ether 0a:d1:9a:2b:a4:2f brd ff:ff:ff:ff:ff:ff
altname enp0s5
3: eni306cbc4b983@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
link/ether 66:dd:ca:b1:37:ea brd ff:ff:ff:ff:ff:ff link-netns cni-abf9bb29-2407-a652-8b9f-ee6828c45956
4: enifd8fb8f99f1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
link/ether 72:fc:2b:7b:9e:fc brd ff:ff:ff:ff:ff:ff link-netns cni-6b005a7c-9573-9156-ea6b-99423ccdcd6b
5: eniecfe8b07af8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
link/ether ca:77:ea:d4:2d:7f brd ff:ff:ff:ff:ff:ff link-netns cni-824f05c3-b6c4-7ccb-445d-6895046eaf6b
6: eni910811243e2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
link/ether 3e:55:91:3d:ce:0f brd ff:ff:ff:ff:ff:ff link-netns cni-b7218768-8633-9d14-d634-0a43a6053a65
7: eni81df13ca303@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
link/ether 22:e8:18:5c:18:94 brd ff:ff:ff:ff:ff:ff link-netns cni-6c06a3fe-4b87-b321-8c60-c129372a07a1
8: eni29b100bd66f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
link/ether fe:62:bd:77:68:6e brd ff:ff:ff:ff:ff:ff link-netns cni-511e8790-43ec-921a-71fa-1f6d167c3355
15: eni589b674b8a8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
link/ether de:86:4b:fa:f6:9d brd ff:ff:ff:ff:ff:ff link-netns cni-2aaf7906-352a-f0e2-341c-d4f2ed6f4ac7
17: enif6eb1a0053f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
link/ether fa:11:de:9d:70:4c brd ff:ff:ff:ff:ff:ff link-netns cni-8492616a-9990-6db9-2b66-233a7a7fd26b
root@i-069e28cbaee4769cf:/#
root@i-069e28cbaee4769cf:/# ip rule show
0: from all lookup local
512: from all to 10.0.0.157 lookup main
512: from all to 10.0.0.134 lookup main
512: from all to 10.0.0.156 lookup main
512: from all to 10.0.0.148 lookup main
512: from all to 10.0.0.132 lookup main
512: from all to 10.0.0.137 lookup main
512: from all to 10.0.0.144 lookup main
512: from all to 10.0.0.149 lookup main
1024: from all fwmark 0x80/0x80 lookup main
32766: from all lookup main
32767: from all lookup default
root@i-069e28cbaee4769cf:/#
root@i-069e28cbaee4769cf:/# ip route show table main
default via 10.0.0.129 dev ens5 proto dhcp src 10.0.0.155 metric 100
10.0.0.2 via 10.0.0.129 dev ens5 proto dhcp src 10.0.0.155 metric 100
10.0.0.128/27 dev ens5 proto kernel scope link src 10.0.0.155 metric 100
10.0.0.129 dev ens5 proto dhcp scope link src 10.0.0.155 metric 100
10.0.0.132 dev eni81df13ca303 scope link
10.0.0.134 dev enifd8fb8f99f1 scope link
10.0.0.137 dev eni29b100bd66f scope link
10.0.0.144 dev eni589b674b8a8 scope link
10.0.0.148 dev eni910811243e2 scope link
10.0.0.149 dev enif6eb1a0053f scope link
10.0.0.156 dev eniecfe8b07af8 scope link
10.0.0.157 dev eni306cbc4b983 scope link
root@i-069e28cbaee4769cf:/#
root@i-069e28cbaee4769cf:/# ip neigh show
10.0.0.132 dev eni81df13ca303 lladdr 2a:84:f2:4c:d9:6d STALE
10.0.0.152 dev ens5 lladdr 0a:78:67:ec:2c:d5 STALE
10.0.0.156 dev eniecfe8b07af8 lladdr ea:1c:91:d7:6f:e2 REACHABLE
10.0.0.137 dev eni29b100bd66f lladdr 2a:17:07:93:e4:0b REACHABLE
10.0.0.129 dev ens5 lladdr 0a:95:a6:82:b4:ef REACHABLE
10.0.0.134 dev enifd8fb8f99f1 lladdr 4a:ef:3f:9e:3b:a5 REACHABLE
10.0.0.148 dev eni910811243e2 lladdr a6:98:cf:a9:3d:bb STALE
10.0.0.157 dev eni306cbc4b983 lladdr 72:c1:d2:01:90:1c REACHABLE
10.0.0.147 dev ens5 lladdr 0a:42:c0:5f:b6:fb REACHABLE
@bpradipt Thank you very much!
The output of ip address
in the pod shows that the Pod IP is 10.0.0.149
.
The output of ip route show table main
on the worker node shows that traffics to the Pod IP is routed via enif6eb1a0053f
10.0.0.149 dev enif6eb1a0053f scope link
The output of ip link show
on the worker node shows that the virtual Ethernet interface enif6eb1a0053f
has MAC address fa:11:de:9d:70:4c
and the other end of the virtual Ethernet is in network namespace cni-8492616a-9990-6db9-2b66-233a7a7fd26b
.
17: enif6eb1a0053f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
link/ether fa:11:de:9d:70:4c brd ff:ff:ff:ff:ff:ff link-netns cni-8492616a-9990-6db9-2b66-233a7a7fd26b
An ARP entry for this MAC address is explicitly set in the pod network as follows.
169.254.1.1 dev eth0 lladdr fa:11:de:9d:70:4c PERMANENT
So, I think we can fix the connectivity issue by setting this ARP entry like this
kubectl exec pod/<pod name> -- ip neigh add 169.254.1.1 dev eth0 lladdr <MAC address> nud permanent
@bpradipt could you create a peer pod and try this work around to check whether the external connectivity issue is fixed or not? You can identify a MAC address as described above.
Another thing I noticed is that jumbo frames (MTU 9001) are enabled on EKS. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html
The current implementation of peer pods restricts a maximum MTU size to be not greater than 1450. (https://github.com/confidential-containers/cloud-api-adaptor/pull/68)
I am not sure this will cause connectivity issue or not. I think TCP connections are not affected since MSS is negosiated during TCP handshakes. UDP packets initiated from a peer pod will not be affected, since a smaller MTU size is used.
UDP traffics initiated from a regular pod to a peer pod will be fragmented. If path MTU Discovery does not work due to peer pods, large packets will be dropped. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html#path_mtu_discovery
Anyway, jumbo frames should be supported with peer pods from the performance perspective, so I will investigate how we can adjust MTU.
Awesome @yoheiueda. I tried your suggestion and it fixes the issue :-)
As reported at https://github.com/confidential-containers/cloud-api-adaptor/pull/1920#issuecomment-2252935108, peer pod network has external network connectivity issue with EKS CNI.
The design of the CNI plugin for Kubernetes networking over AWS VPC is described here. https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/cni-proposal.md#solution-components