aws-samples / eks-automated-ipmgmt-multus-pods

MIT No Attribution
13 stars 9 forks source link

External connectivity from multus additional interface #5

Closed alena-ait closed 1 year ago

alena-ait commented 1 year ago

Is it possible and what should be done to get possibility to send traffic to Internet via net1 interface?

Using current solution we can setup connect between two pods in EKS, but if we try to reach external destination

kubectl exec -it busybox-deployment-65d88b5bb5-4kvgc -n multus -- ping -I net1 193.7.169.20 Defaulted container "busybox" out of: busybox, aws-ip-mgmt (init) PING 193.7.169.20 (193.7.169.20): 56 data bytes ^C --- 193.7.169.20 ping statistics --- 4 packets transmitted, 0 packets received, 100% packet loss

It doesn't work. Maybe some additional routes should be setup?

redterror commented 1 year ago

I had to add the multus sbr (source based routing) plugin to my network definition. This assumes you have the AWS-level subnet set correctly and you can ping the aws subnet gateway.

My network definition is approximately:

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: my-net
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "my-net",
      "plugins": [{
        "type": "ipvlan",
        "master": "eth1",
        "ipam": {
          "type": "whereabouts",
          "datastore": "kubernetes",
          "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" },
          "range": "{{ .Values.my_thingy.range }}",
          "gateway": "{{ .Values.my_thingy.gateway }}",
          "log_file": "/tmp/whereabouts.log",
          "log_level": "debug"
        }
      }, {
        "type": "sbr"
      }]
    }

From there I could successfully do commands like curl --interface net1 ifconfig.me from my container.

alena-ait commented 1 year ago

@redterror Thank you for your reply , I really appreciate it. My network attachment definition is a very similar to yours

apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
  name: ipvlan-multus
spec:
  config: |
    {
      "cniVersion": "0.3.1",
      "name": "ipvlan-multus",
      "plugins": [{
        "type": "ipvlan",
        "master": "eth1",
        "ipam": {
          "type": "whereabouts",
          "datastore": "kubernetes",
          "kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" },
          "range": "10.40.128.80-10.40.128.87/24",
          "routes": [
            { "dst": "12.12.12.12/32" }
          ],
          "gateway": "10.40.128.1",
          "log_file": "/tmp/whereabouts.log",
          "log_level": "debug"
        }
      }, {
        "type": "sbr"
      }]
    }

But for some reason I cannot reach external world via multus interface, everything looks corect and works beetwen pods

kubectl exec -it busybox-deployment-9b5998bfc-7mlrm -n multus  -- ip addr
Defaulted container "busybox" out of: busybox, aws-ip-mgmt (init)
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
3: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 9001 qdisc noqueue 
    link/ether be:9b:81:f0:80:b0 brd ff:ff:ff:ff:ff:ff
    inet 10.40.16.5/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet6 fe80::bc9b:81ff:fef0:80b0/64 scope link 
       valid_lft forever preferred_lft forever
4: net1@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue 
    link/ether 02:c1:81:60:7c:b3 brd ff:ff:ff:ff:ff:ff
    inet 10.40.128.83/24 brd 10.40.128.255 scope global net1
       valid_lft forever preferred_lft forever
    inet6 fe80::2c1:8100:260:7cb3/64 scope link 
       valid_lft forever preferred_lft forever

kubectl exec -it busybox-deployment-9b5998bfc-7mlrm -n multus -- curl --interface net1 ifconfig.me Defaulted container "busybox" out of: busybox, aws-ip-mgmt (init) hangs without result kubectl exec -it busybox-deployment-9b5998bfc-7mlrm -n multus -- ping -I net1 ifconfig.me Defaulted container "busybox" out of: busybox, aws-ip-mgmt (init) PING ifconfig.me (34.160.111.145): 56 data bytes ^C --- ifconfig.me ping statistics --- 7 packets transmitted, 0 packets received, 100% packet loss

Did you setup additional routes/something else directly on EKS node?

redterror commented 1 year ago

Did you setup additional routes/something else directly on EKS node?

No additional routes. I used:

I have heard from a colleague of an alternate approach where their initContainer did not do the IP -> ENI association step, but rather added a route in the subnet's route table to map the IP to the ENI. I suppose this is logically the same operation. In my colleague's case he did this to use a single network definition that spanned multiple smaller AWS subnets (and thus, he achieved a single network that spanned AZs).

alena-ait commented 1 year ago

@redterror Thanks again. On the whole my configuration is the same excluding one moment - I use Managed Node Group instead of Self managed(but I also execute userdata commands from eks-install-guide-for-multus CF solution, I mean this

`            echo "net.ipv4.conf.default.rp_filter = 0" | tee -a /etc/sysctl.conf
            echo "net.ipv4.conf.all.rp_filter = 0" | tee -a /etc/sysctl.conf
            sudo sysctl -p
            sleep 30
            ls /sys/class/net/ > /tmp/ethList;cat /tmp/ethList |while read line ; do sudo ifconfig $line up; done
            grep eth /tmp/ethList |while read line ; do echo "ifconfig $line up" >> /etc/rc.d/rc.local; done
            systemctl enable rc-local
            chmod +x /etc/rc.d/rc.local`

) Probably there is something different between self-managed and managed nodes in part how they work with network setup, I will try to switch to self-managed. Do your Multus subnets also use only default route table with only local route(according to author's CF stack)? And ping directly from worker node via additional interface to external world also fails?

redterror commented 1 year ago

I also use a managed node group with a custom userdata block. I opted for a slightly different approach there:

        UserData: !Base64 |
          MIME-Version: 1.0
          Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

          --==MYBOUNDARY==
          Content-Type: text/x-shellscript; charset="us-ascii"

          #!/bin/bash
          set -euo pipefail
          yum install -y ec2-net-utils
          ec2ifup eth1

          --==MYBOUNDARY==--

I originally took this approach since I wanted to first verify I could do my curl test command natively on the box, but ultimately that became a dead end (I never figured out why, despite all the policy routing being seemingly correct). It generally seemed simpler too.

Its possible that my UserData block has some routing side effects not present in the ifconfig foo up approach, since my block does assign an IP to the EKS node's interface directly and it does setup some routes in the process.

IP consumption should be the same, since the IP allocated by AWS via dhcp will be from outside the cidr reservation.

alena-ait commented 1 year ago

I have already tried to use ec2ifup before and result was worse) Even metadata starts to be unavailable for init containers Exception :HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /latest/dynamic/instance-identity/document (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f9b7660c7c0>, 'Connection to 169.254.169.254 timed out. (connect timeout=2)')) and coredns on nodes where I execute ec2ifup stop to work correctly

Tell me please @redterror - did you modify somehow sysctl settings? Author of CF solution changes sysctl net.ipv4.conf.default.rp_filter

https://github.com/aws-samples/eks-install-guide-for-multus/blob/main/cfn/templates/nodegroup/eks-nodegroup-multus.yaml#L578C5-L579C77

As I understanf from your userdata you don't use changes of sysctl and only execute ec2ifup, right?

redterror commented 1 year ago

As I understanf from your userdata you don't use changes of sysctl and only execute ec2ifup, right?

Correct - what I posted above is my full UserData block.

alena-ait commented 1 year ago

Cannot say exactly why it happens, but if I setup my EKS nodes with ec2ifup (I see all routes and rules , they are correct) 1)I still cannot reach external world from multus-eni-pods 2)EKS starts to suffer from network problem, some pods (non-multus) cannot connect API , some pod cannot get metadata, etc I don't know why it works for you, I use EKS 1.27, maybe they added some additional restrict rules or something like this

One more interesting thing - these ENIs which are setup with ec2ifup, their routes looks ok but you cannot reach external world via them as well. For example my multus network is 10.40.128.0 on interface eth1 Routes are present

ip route
default via 10.40.16.1 dev eth0
default via 10.40.128.1 dev eth1 metric 10001
default via 10.40.131.1 dev eth2 metric 10002
default via 10.40.16.1 dev eth3 metric 10003
10.40.16.0/24 dev eth0 proto kernel scope link src 10.40.16.60
10.40.16.0/24 dev eth3 proto kernel scope link src 10.40.16.82
10.40.16.40 dev eni8caa1bad621 scope link
10.40.16.81 dev eni5045b565322 scope link
10.40.16.169 dev eni140753dfb67 scope link
10.40.16.173 dev eni79c15e3edf0 scope link
10.40.128.0/24 dev eth1 proto kernel scope link src 10.40.128.135
10.40.131.0/24 dev eth2 proto kernel scope link src 10.40.131.18
169.254.169.254 dev eth0`

but

sudo ping -vvv -I eth1 8.8.8.8
PING 8.8.8.8 (8.8.8.8) from 10.40.128.135 eth1: 56(84) bytes of data.
^C
--- 8.8.8.8 ping statistics ---
62 packets transmitted, 0 received, 100% packet loss, time 62459ms

and flow logs show nothing, looks like traffic doesn't leave node

redterror commented 1 year ago

@alena-ait I believe we are still on 1.25 or 1.26, so you're slightly ahead of me. On the EKS nodes directly, I observed similar routes and similar routing issues. I never figured out why, but ultimately adding the sbr plugin was what resolved the issue on my end.

The out-of-the-box iptables / routing configuration was complicated enough that I couldn't figure out where my own policy route additions were failing. The desired behavior (e.g. ping -I / curl --interface) should be a textbook policy routing setup via ip rule or similar.

raghs-aws commented 1 year ago

Hi @alena-ait and @redterror apologies for late replies. a few pointers . Dont use the ec2-net-utils as you already discovered, it adds ip addresses on the Host OS and adds some specific routes, which will create some routing challenges later. so its better to follow the ifconfig up approach.

for your issue, @redterror solution was right, so you can use either 1/ SBR plugin to handle pod routing or you would have to define static routing via your multus net-attach-def (NAD) like you have added for destination 12.12.12.12/32, you can also add for a network range like /24 etc.

In your case , your issue could be either routing issue or interface on worker node is not up.

  1. without install net-utils and ifup steps, check on the worker node, if the multus interface (eth1/eth2) is up or not? it should show as up, but shouldnt have ip address assigned to it
  2. if above is up, then try either using setting up routes either by SBR or static routes via NAD. you can try narrowing down the issue, 1st try to reach another multus pod in same subnet/vpc and outside to validate the routing path. you could also check with ip route get command in pod to check which interface has the routing.
  3. you can also check by. taking tcpdump at host or pod level, to see if packets. are reaching and being replied to
raghs-aws commented 1 year ago

Another good suggestion from my esteemed colleague @jungy-aws based on his experience with SBR before. run below command to. validate ping kubectl exec -it busybox-deployment-65d88b5bb5-4kvgc -n multus -- ping -I 10.40.128.83 193.7.169.20

instead of kubectl exec -it busybox-deployment-65d88b5bb5-4kvgc -n multus -- ping -I net1 193.7.169.20

alena-ait commented 1 year ago

Hi @raghs-aws , Thank you for your answer, about the first point in your first reply: without ifconfig up command in userdata, of course, additional interface will have status down, but I have up command in my cloud-init script, there is no problem with it And your colleague was right (please tell @jungy-aws a big thanks from me). I don't know what is an issue with using name of interface(for internal network it works without problem , it is very ordinary thing and @redterror also checked connect via interface name and got positive result ), probably it is related to SBR specific, but with ip addr instead of interface my ping can connect external world I will double check with network tools to make sure that I get traffic from correct interface, but looks like it works

One small note which can be helpful for other people - in this solution https://github.com/aws-samples/eks-install-guide-for-multus/blob/main/cfn/templates/infra/eks-infra.yaml, team creates additional Mutus networks with default RT, there is no route to NAT GW, so in case of external connectivity you should change it to private network RT or add route

jungy-aws commented 1 year ago

Hi @alena-ait, so glad to hear that it helped! For the happening, as far as I have looked into (but didn't get a time to check with the source code of Linux ping, so all hypothetically :)), when we use 'ping -I dev', then it assumes ping destination is on the link (in the same network), so it tries to take ARP operation first with given destination IP (which blocks kernel to send actual ping packet to the network). At least, I took a pcap and confirmed, ping tries to send ARP first (w/o sending ICMP), so that's why I tried to use 'ping -I source_ip' instead of '-I dev' option. Anyways, I think your application will be fine to send a packet to the external network unlikely to the "ping -I dev" case as long as you have SBR (by plugin or manual setting, either way).

raghs-aws commented 1 year ago

Thank you @alena-ait. glad that your issue is resolved. I will close the issue :)