confidential-containers / cloud-api-adaptor

Ability to create Kata pods using cloud provider APIs aka the peer-pods approach
Apache License 2.0
48 stars 83 forks source link

External network connectivity issue with EKS CNI #1966

Closed yoheiueda closed 1 month ago

yoheiueda commented 2 months ago

As reported at https://github.com/confidential-containers/cloud-api-adaptor/pull/1920#issuecomment-2252935108, peer pod network has external network connectivity issue with EKS CNI.

The design of the CNI plugin for Kubernetes networking over AWS VPC is described here. https://github.com/aws/amazon-vpc-cni-k8s/blob/master/docs/cni-proposal.md#solution-components

yoheiueda commented 2 months ago

@bpradipt @EmmEff is it possible to collect some diagnostic data on EKS?

Create a regular (runc) pod and execute the following commands in the pod with kubectl exec. If you can access the worker node that the pod is running, please execute the same commands.

ip address show
ip link show
ip rule show
ip route show table main
ip neigh show

According to the documentation, the EKS CNI plugin explicitly sets a static ARP entry. If so, I think we can fix the issue by setting the same ARP entry in a network namespace in a peer pod VM.

bpradipt commented 2 months ago

@yoheiueda please find the requested details

O/p from regular runc pod

[root@priv-pod /]# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
3: eth0@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether 56:37:6d:d9:1b:6a brd ff:ff:ff:ff:ff:ff link-netnsid 0
    inet 10.0.0.149/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::5437:6dff:fed9:1b6a/64 scope link
       valid_lft forever preferred_lft forever
[root@priv-pod /]#
[root@priv-pod /]# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
3: eth0@if17: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether 56:37:6d:d9:1b:6a brd ff:ff:ff:ff:ff:ff link-netnsid 0
[root@priv-pod /]#
[root@priv-pod /]# ip rule show
0:  from all lookup local
32766:  from all lookup main
32767:  from all lookup default
[root@priv-pod /]#
[root@priv-pod /]# ip route show table main
default via 169.254.1.1 dev eth0
169.254.1.1 dev eth0 scope link
[root@priv-pod /]#
[root@priv-pod /]# ip neigh show
169.254.1.1 dev eth0 lladdr fa:11:de:9d:70:4c PERMANENT

O/p from the worker node

root@i-069e28cbaee4769cf:/# ip address show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
    link/ether 0a:d1:9a:2b:a4:2f brd ff:ff:ff:ff:ff:ff
    altname enp0s5
    inet 10.0.0.155/27 metric 100 brd 10.0.0.159 scope global dynamic ens5
       valid_lft 3026sec preferred_lft 3026sec
    inet6 fe80::8d1:9aff:fe2b:a42f/64 scope link
       valid_lft forever preferred_lft forever
3: eni306cbc4b983@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether 66:dd:ca:b1:37:ea brd ff:ff:ff:ff:ff:ff link-netns cni-abf9bb29-2407-a652-8b9f-ee6828c45956
    inet6 fe80::64dd:caff:feb1:37ea/64 scope link
       valid_lft forever preferred_lft forever
4: enifd8fb8f99f1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether 72:fc:2b:7b:9e:fc brd ff:ff:ff:ff:ff:ff link-netns cni-6b005a7c-9573-9156-ea6b-99423ccdcd6b
    inet6 fe80::70fc:2bff:fe7b:9efc/64 scope link
       valid_lft forever preferred_lft forever
5: eniecfe8b07af8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether ca:77:ea:d4:2d:7f brd ff:ff:ff:ff:ff:ff link-netns cni-824f05c3-b6c4-7ccb-445d-6895046eaf6b
    inet6 fe80::c877:eaff:fed4:2d7f/64 scope link
       valid_lft forever preferred_lft forever
6: eni910811243e2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether 3e:55:91:3d:ce:0f brd ff:ff:ff:ff:ff:ff link-netns cni-b7218768-8633-9d14-d634-0a43a6053a65
    inet6 fe80::3c55:91ff:fe3d:ce0f/64 scope link
       valid_lft forever preferred_lft forever
7: eni81df13ca303@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether 22:e8:18:5c:18:94 brd ff:ff:ff:ff:ff:ff link-netns cni-6c06a3fe-4b87-b321-8c60-c129372a07a1
    inet6 fe80::20e8:18ff:fe5c:1894/64 scope link
       valid_lft forever preferred_lft forever
8: eni29b100bd66f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether fe:62:bd:77:68:6e brd ff:ff:ff:ff:ff:ff link-netns cni-511e8790-43ec-921a-71fa-1f6d167c3355
    inet6 fe80::fc62:bdff:fe77:686e/64 scope link
       valid_lft forever preferred_lft forever
15: eni589b674b8a8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether de:86:4b:fa:f6:9d brd ff:ff:ff:ff:ff:ff link-netns cni-2aaf7906-352a-f0e2-341c-d4f2ed6f4ac7
    inet6 fe80::dc86:4bff:fefa:f69d/64 scope link
       valid_lft forever preferred_lft forever
17: enif6eb1a0053f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
    link/ether fa:11:de:9d:70:4c brd ff:ff:ff:ff:ff:ff link-netns cni-8492616a-9990-6db9-2b66-233a7a7fd26b
    inet6 fe80::f811:deff:fe9d:704c/64 scope link
       valid_lft forever preferred_lft forever
root@i-069e28cbaee4769cf:/#
root@i-069e28cbaee4769cf:/# ip link show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: ens5: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP mode DEFAULT group default qlen 1000
    link/ether 0a:d1:9a:2b:a4:2f brd ff:ff:ff:ff:ff:ff
    altname enp0s5
3: eni306cbc4b983@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether 66:dd:ca:b1:37:ea brd ff:ff:ff:ff:ff:ff link-netns cni-abf9bb29-2407-a652-8b9f-ee6828c45956
4: enifd8fb8f99f1@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether 72:fc:2b:7b:9e:fc brd ff:ff:ff:ff:ff:ff link-netns cni-6b005a7c-9573-9156-ea6b-99423ccdcd6b
5: eniecfe8b07af8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether ca:77:ea:d4:2d:7f brd ff:ff:ff:ff:ff:ff link-netns cni-824f05c3-b6c4-7ccb-445d-6895046eaf6b
6: eni910811243e2@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether 3e:55:91:3d:ce:0f brd ff:ff:ff:ff:ff:ff link-netns cni-b7218768-8633-9d14-d634-0a43a6053a65
7: eni81df13ca303@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether 22:e8:18:5c:18:94 brd ff:ff:ff:ff:ff:ff link-netns cni-6c06a3fe-4b87-b321-8c60-c129372a07a1
8: eni29b100bd66f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether fe:62:bd:77:68:6e brd ff:ff:ff:ff:ff:ff link-netns cni-511e8790-43ec-921a-71fa-1f6d167c3355
15: eni589b674b8a8@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether de:86:4b:fa:f6:9d brd ff:ff:ff:ff:ff:ff link-netns cni-2aaf7906-352a-f0e2-341c-d4f2ed6f4ac7
17: enif6eb1a0053f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether fa:11:de:9d:70:4c brd ff:ff:ff:ff:ff:ff link-netns cni-8492616a-9990-6db9-2b66-233a7a7fd26b
root@i-069e28cbaee4769cf:/#
root@i-069e28cbaee4769cf:/# ip rule show
0:  from all lookup local
512:    from all to 10.0.0.157 lookup main
512:    from all to 10.0.0.134 lookup main
512:    from all to 10.0.0.156 lookup main
512:    from all to 10.0.0.148 lookup main
512:    from all to 10.0.0.132 lookup main
512:    from all to 10.0.0.137 lookup main
512:    from all to 10.0.0.144 lookup main
512:    from all to 10.0.0.149 lookup main
1024:   from all fwmark 0x80/0x80 lookup main
32766:  from all lookup main
32767:  from all lookup default
root@i-069e28cbaee4769cf:/#
root@i-069e28cbaee4769cf:/# ip route show table main
default via 10.0.0.129 dev ens5 proto dhcp src 10.0.0.155 metric 100
10.0.0.2 via 10.0.0.129 dev ens5 proto dhcp src 10.0.0.155 metric 100
10.0.0.128/27 dev ens5 proto kernel scope link src 10.0.0.155 metric 100
10.0.0.129 dev ens5 proto dhcp scope link src 10.0.0.155 metric 100
10.0.0.132 dev eni81df13ca303 scope link
10.0.0.134 dev enifd8fb8f99f1 scope link
10.0.0.137 dev eni29b100bd66f scope link
10.0.0.144 dev eni589b674b8a8 scope link
10.0.0.148 dev eni910811243e2 scope link
10.0.0.149 dev enif6eb1a0053f scope link
10.0.0.156 dev eniecfe8b07af8 scope link
10.0.0.157 dev eni306cbc4b983 scope link
root@i-069e28cbaee4769cf:/#
root@i-069e28cbaee4769cf:/# ip neigh show
10.0.0.132 dev eni81df13ca303 lladdr 2a:84:f2:4c:d9:6d STALE
10.0.0.152 dev ens5 lladdr 0a:78:67:ec:2c:d5 STALE
10.0.0.156 dev eniecfe8b07af8 lladdr ea:1c:91:d7:6f:e2 REACHABLE
10.0.0.137 dev eni29b100bd66f lladdr 2a:17:07:93:e4:0b REACHABLE
10.0.0.129 dev ens5 lladdr 0a:95:a6:82:b4:ef REACHABLE
10.0.0.134 dev enifd8fb8f99f1 lladdr 4a:ef:3f:9e:3b:a5 REACHABLE
10.0.0.148 dev eni910811243e2 lladdr a6:98:cf:a9:3d:bb STALE
10.0.0.157 dev eni306cbc4b983 lladdr 72:c1:d2:01:90:1c REACHABLE
10.0.0.147 dev ens5 lladdr 0a:42:c0:5f:b6:fb REACHABLE
yoheiueda commented 2 months ago

@bpradipt Thank you very much!

The output of ip address in the pod shows that the Pod IP is 10.0.0.149.

The output of ip route show table main on the worker node shows that traffics to the Pod IP is routed via enif6eb1a0053f

10.0.0.149 dev enif6eb1a0053f scope link

The output of ip link show on the worker node shows that the virtual Ethernet interface enif6eb1a0053f has MAC address fa:11:de:9d:70:4c and the other end of the virtual Ethernet is in network namespace cni-8492616a-9990-6db9-2b66-233a7a7fd26b.

17: enif6eb1a0053f@if3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP mode DEFAULT group default
    link/ether fa:11:de:9d:70:4c brd ff:ff:ff:ff:ff:ff link-netns cni-8492616a-9990-6db9-2b66-233a7a7fd26b

An ARP entry for this MAC address is explicitly set in the pod network as follows.

169.254.1.1 dev eth0 lladdr fa:11:de:9d:70:4c PERMANENT

So, I think we can fix the connectivity issue by setting this ARP entry like this

kubectl exec pod/<pod name> -- ip neigh add 169.254.1.1 dev eth0 lladdr <MAC address> nud permanent

@bpradipt could you create a peer pod and try this work around to check whether the external connectivity issue is fixed or not? You can identify a MAC address as described above.

yoheiueda commented 2 months ago

Another thing I noticed is that jumbo frames (MTU 9001) are enabled on EKS. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html

The current implementation of peer pods restricts a maximum MTU size to be not greater than 1450. (https://github.com/confidential-containers/cloud-api-adaptor/pull/68)

I am not sure this will cause connectivity issue or not. I think TCP connections are not affected since MSS is negosiated during TCP handshakes. UDP packets initiated from a peer pod will not be affected, since a smaller MTU size is used.

UDP traffics initiated from a regular pod to a peer pod will be fragmented. If path MTU Discovery does not work due to peer pods, large packets will be dropped. https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/network_mtu.html#path_mtu_discovery

Anyway, jumbo frames should be supported with peer pods from the performance perspective, so I will investigate how we can adjust MTU.

bpradipt commented 2 months ago

Awesome @yoheiueda. I tried your suggestion and it fixes the issue :-)