Closed alena-ait closed 1 year ago
I had to add the multus sbr
(source based routing) plugin to my network definition. This assumes you have the AWS-level subnet set correctly and you can ping the aws subnet gateway.
My network definition is approximately:
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: my-net
spec:
config: |
{
"cniVersion": "0.3.1",
"name": "my-net",
"plugins": [{
"type": "ipvlan",
"master": "eth1",
"ipam": {
"type": "whereabouts",
"datastore": "kubernetes",
"kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" },
"range": "{{ .Values.my_thingy.range }}",
"gateway": "{{ .Values.my_thingy.gateway }}",
"log_file": "/tmp/whereabouts.log",
"log_level": "debug"
}
}, {
"type": "sbr"
}]
}
From there I could successfully do commands like curl --interface net1 ifconfig.me
from my container.
@redterror Thank you for your reply , I really appreciate it. My network attachment definition is a very similar to yours
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: ipvlan-multus
spec:
config: |
{
"cniVersion": "0.3.1",
"name": "ipvlan-multus",
"plugins": [{
"type": "ipvlan",
"master": "eth1",
"ipam": {
"type": "whereabouts",
"datastore": "kubernetes",
"kubernetes": { "kubeconfig": "/etc/cni/net.d/whereabouts.d/whereabouts.kubeconfig" },
"range": "10.40.128.80-10.40.128.87/24",
"routes": [
{ "dst": "12.12.12.12/32" }
],
"gateway": "10.40.128.1",
"log_file": "/tmp/whereabouts.log",
"log_level": "debug"
}
}, {
"type": "sbr"
}]
}
But for some reason I cannot reach external world via multus interface, everything looks corect and works beetwen pods
kubectl exec -it busybox-deployment-9b5998bfc-7mlrm -n multus -- ip addr
Defaulted container "busybox" out of: busybox, aws-ip-mgmt (init)
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
3: eth0@if6: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 9001 qdisc noqueue
link/ether be:9b:81:f0:80:b0 brd ff:ff:ff:ff:ff:ff
inet 10.40.16.5/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::bc9b:81ff:fef0:80b0/64 scope link
valid_lft forever preferred_lft forever
4: net1@eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue
link/ether 02:c1:81:60:7c:b3 brd ff:ff:ff:ff:ff:ff
inet 10.40.128.83/24 brd 10.40.128.255 scope global net1
valid_lft forever preferred_lft forever
inet6 fe80::2c1:8100:260:7cb3/64 scope link
valid_lft forever preferred_lft forever
kubectl exec -it busybox-deployment-9b5998bfc-7mlrm -n multus -- curl --interface net1 ifconfig.me Defaulted container "busybox" out of: busybox, aws-ip-mgmt (init)
hangs without result
kubectl exec -it busybox-deployment-9b5998bfc-7mlrm -n multus -- ping -I net1 ifconfig.me Defaulted container "busybox" out of: busybox, aws-ip-mgmt (init) PING ifconfig.me (34.160.111.145): 56 data bytes ^C --- ifconfig.me ping statistics --- 7 packets transmitted, 0 packets received, 100% packet loss
Did you setup additional routes/something else directly on EKS node?
Did you setup additional routes/something else directly on EKS node?
No additional routes. I used:
I have heard from a colleague of an alternate approach where their initContainer did not do the IP -> ENI association step, but rather added a route in the subnet's route table to map the IP to the ENI. I suppose this is logically the same operation. In my colleague's case he did this to use a single network definition that spanned multiple smaller AWS subnets (and thus, he achieved a single network that spanned AZs).
@redterror Thanks again. On the whole my configuration is the same excluding one moment - I use Managed Node Group instead of Self managed(but I also execute userdata commands from eks-install-guide-for-multus CF solution, I mean this
` echo "net.ipv4.conf.default.rp_filter = 0" | tee -a /etc/sysctl.conf
echo "net.ipv4.conf.all.rp_filter = 0" | tee -a /etc/sysctl.conf
sudo sysctl -p
sleep 30
ls /sys/class/net/ > /tmp/ethList;cat /tmp/ethList |while read line ; do sudo ifconfig $line up; done
grep eth /tmp/ethList |while read line ; do echo "ifconfig $line up" >> /etc/rc.d/rc.local; done
systemctl enable rc-local
chmod +x /etc/rc.d/rc.local`
) Probably there is something different between self-managed and managed nodes in part how they work with network setup, I will try to switch to self-managed. Do your Multus subnets also use only default route table with only local route(according to author's CF stack)? And ping directly from worker node via additional interface to external world also fails?
I also use a managed node group with a custom userdata block. I opted for a slightly different approach there:
UserData: !Base64 |
MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="
--==MYBOUNDARY==
Content-Type: text/x-shellscript; charset="us-ascii"
#!/bin/bash
set -euo pipefail
yum install -y ec2-net-utils
ec2ifup eth1
--==MYBOUNDARY==--
I originally took this approach since I wanted to first verify I could do my curl
test command natively on the box, but ultimately that became a dead end (I never figured out why, despite all the policy routing being seemingly correct). It generally seemed simpler too.
Its possible that my UserData block has some routing side effects not present in the ifconfig foo up
approach, since my block does assign an IP to the EKS node's interface directly and it does setup some routes in the process.
IP consumption should be the same, since the IP allocated by AWS via dhcp will be from outside the cidr reservation.
I have already tried to use ec2ifup before and result was worse)
Even metadata starts to be unavailable for init containers
Exception :HTTPConnectionPool(host='169.254.169.254', port=80): Max retries exceeded with url: /latest/dynamic/instance-identity/document (Caused by ConnectTimeoutError(<urllib3.connection.HTTPConnection object at 0x7f9b7660c7c0>, 'Connection to 169.254.169.254 timed out. (connect timeout=2)'))
and coredns on nodes where I execute ec2ifup stop to work correctly
Tell me please @redterror - did you modify somehow sysctl settings? Author of CF solution changes sysctl net.ipv4.conf.default.rp_filter
As I understanf from your userdata you don't use changes of sysctl and only execute ec2ifup, right?
As I understanf from your userdata you don't use changes of sysctl and only execute ec2ifup, right?
Correct - what I posted above is my full UserData block.
Cannot say exactly why it happens, but if I setup my EKS nodes with ec2ifup (I see all routes and rules , they are correct) 1)I still cannot reach external world from multus-eni-pods 2)EKS starts to suffer from network problem, some pods (non-multus) cannot connect API , some pod cannot get metadata, etc I don't know why it works for you, I use EKS 1.27, maybe they added some additional restrict rules or something like this
One more interesting thing - these ENIs which are setup with ec2ifup, their routes looks ok but you cannot reach external world via them as well. For example my multus network is 10.40.128.0 on interface eth1 Routes are present
ip route
default via 10.40.16.1 dev eth0
default via 10.40.128.1 dev eth1 metric 10001
default via 10.40.131.1 dev eth2 metric 10002
default via 10.40.16.1 dev eth3 metric 10003
10.40.16.0/24 dev eth0 proto kernel scope link src 10.40.16.60
10.40.16.0/24 dev eth3 proto kernel scope link src 10.40.16.82
10.40.16.40 dev eni8caa1bad621 scope link
10.40.16.81 dev eni5045b565322 scope link
10.40.16.169 dev eni140753dfb67 scope link
10.40.16.173 dev eni79c15e3edf0 scope link
10.40.128.0/24 dev eth1 proto kernel scope link src 10.40.128.135
10.40.131.0/24 dev eth2 proto kernel scope link src 10.40.131.18
169.254.169.254 dev eth0`
but
sudo ping -vvv -I eth1 8.8.8.8
PING 8.8.8.8 (8.8.8.8) from 10.40.128.135 eth1: 56(84) bytes of data.
^C
--- 8.8.8.8 ping statistics ---
62 packets transmitted, 0 received, 100% packet loss, time 62459ms
and flow logs show nothing, looks like traffic doesn't leave node
@alena-ait I believe we are still on 1.25 or 1.26, so you're slightly ahead of me. On the EKS nodes directly, I observed similar routes and similar routing issues. I never figured out why, but ultimately adding the sbr
plugin was what resolved the issue on my end.
The out-of-the-box iptables / routing configuration was complicated enough that I couldn't figure out where my own policy route additions were failing. The desired behavior (e.g. ping -I
/ curl --interface
) should be a textbook policy routing setup via ip rule
or similar.
Hi @alena-ait and @redterror apologies for late replies. a few pointers . Dont use the ec2-net-utils as you already discovered, it adds ip addresses on the Host OS and adds some specific routes, which will create some routing challenges later. so its better to follow the ifconfig up approach.
for your issue, @redterror solution was right, so you can use either 1/ SBR plugin to handle pod routing or you would have to define static routing via your multus net-attach-def (NAD) like you have added for destination 12.12.12.12/32, you can also add for a network range like /24 etc.
In your case , your issue could be either routing issue or interface on worker node is not up.
Another good suggestion from my esteemed colleague @jungy-aws based on his experience with SBR before. run below command to. validate ping kubectl exec -it busybox-deployment-65d88b5bb5-4kvgc -n multus -- ping -I 10.40.128.83 193.7.169.20
instead of kubectl exec -it busybox-deployment-65d88b5bb5-4kvgc -n multus -- ping -I net1 193.7.169.20
Hi @raghs-aws , Thank you for your answer, about the first point in your first reply:
without ifconfig up command
in userdata, of course, additional interface will have status down, but I have up command in my cloud-init script, there is no problem with it
And your colleague was right (please tell @jungy-aws a big thanks from me). I don't know what is an issue with using name of interface(for internal network it works without problem , it is very ordinary thing and @redterror also checked connect via interface name and got positive result ), probably it is related to SBR specific, but with ip addr instead of interface my ping can connect external world
I will double check with network tools to make sure that I get traffic from correct interface, but looks like it works
One small note which can be helpful for other people - in this solution https://github.com/aws-samples/eks-install-guide-for-multus/blob/main/cfn/templates/infra/eks-infra.yaml, team creates additional Mutus networks with default RT, there is no route to NAT GW, so in case of external connectivity you should change it to private network RT or add route
Hi @alena-ait, so glad to hear that it helped! For the happening, as far as I have looked into (but didn't get a time to check with the source code of Linux ping, so all hypothetically :)), when we use 'ping -I dev', then it assumes ping destination is on the link (in the same network), so it tries to take ARP operation first with given destination IP (which blocks kernel to send actual ping packet to the network). At least, I took a pcap and confirmed, ping tries to send ARP first (w/o sending ICMP), so that's why I tried to use 'ping -I source_ip' instead of '-I dev' option. Anyways, I think your application will be fine to send a packet to the external network unlikely to the "ping -I dev" case as long as you have SBR (by plugin or manual setting, either way).
Thank you @alena-ait. glad that your issue is resolved. I will close the issue :)
Is it possible and what should be done to get possibility to send traffic to Internet via net1 interface?
Using current solution we can setup connect between two pods in EKS, but if we try to reach external destination
kubectl exec -it busybox-deployment-65d88b5bb5-4kvgc -n multus -- ping -I net1 193.7.169.20 Defaulted container "busybox" out of: busybox, aws-ip-mgmt (init) PING 193.7.169.20 (193.7.169.20): 56 data bytes ^C --- 193.7.169.20 ping statistics --- 4 packets transmitted, 0 packets received, 100% packet loss
It doesn't work. Maybe some additional routes should be setup?