Pods on different nodes fail to communicate

lrossicone commented 1 year ago

Hi, I am having some issues with Megalos: right now i have set up a cluster consisting of three nodes: kube-1 (master), kube-2, kube-3:

NAME    SATUS   ROLES           AGE     VERSION
kube-1  Ready   control-plane   16h     v1.26.1
kube-2  Ready   <none>          16h     v1.26.1
kube-3  Ready   <none>          15h     v1.26.1

I am using Calico as the default network CNI plugin, and booting my labs goes smoothly; I can also execute commands on each of the various pods created by kathara without any problems. The only thing I cannot do is communicate between pods residing on different nodes. For example, by creating a lab consisting of 3 pods (a, b and c):

root@kube-1:/home/vagrant/lab2# cat lab.conf
a[0]="A"
b[0]="A"
c[0]="A"
root@kube-1:/root@kube-1:/home/vagrant/lab2# cat a.startup
ifconfig net0 1.1.1.1/24 up
root@kube-1:/root@kube-1:/home/vagrant/lab2# cat b.startup
ifconfig net0 1.1.1.2/24 up
root@kube-1:/root@kube-1:/home/vagrant/lab2# cat c.startup
ifconfig net0 1.1.1.3/24 up

We can see how kathara distributes them on worker nodes:

root@kube-1:/home/vagrant/lab2# kathara list
TIMESTAMP: 2023-01-23 11_52_33-505442

╔═══════════════════════╦══════╦════════════════════════════╦═══════════════════════╦═════════╦═══════════════╗
║ NETWORK SCENARIO ID   ║ NAME ║ POD NAME                   ║ IMAGE                 ║ STATUS  ║ ASSIGNED NODE ║
╠═══════════════════════╬══════╬════════════════════════════╬═══════════════════════╬═════════╬═══════════════╣
║ xtzdmh2ix5yvksncorm3w ║ a    ║ kathara-a-654d46fb6d-txqh6 ║ kathara/quagga:latest ║ Running ║ kube-3        ║
╠═══════════════════════╬══════╬════════════════════════════╬═══════════════════════╬═════════╬═══════════════╣
║ xtzdmh2ix5yvksncorm3w ║ b    ║ kathara-b-f8dffcb54-gzz99  ║ kathara/quagga:latest ║ Running ║ kube-2        ║
╠═══════════════════════╬══════╬════════════════════════════╬═══════════════════════╬═════════╬═══════════════╣
║ xtzdmh2ix5yvksncorm3w ║ c    ║ kathara-c-7ddff9b468-f68rf ║ kathara/quagga:latest ║ Running ║ kube-2        ║
╚═══════════════════════╩══════╩════════════════════════════╩═══════════════════════╩═════════╩═══════════════╝

The connection to each one is successful:

root@kube-1:/home/vagrant/lab2# kathara connect a
root@kathara-a:/# exit
exit
root@kube-1:/home/vagrant/lab2# kathara connect b
root@kathara-b:/# exit
exit
root@kube-1:/home/vagrant/lab2# kathara connect c
root@kathara-c:/# exit
exit

Communication between the two pods on the kube-2 node is successful:

root@kube-1:/home/vagrant/lab2# kathara exec b ping 1.1.1.3
PING 1.1.1.3 (1.1.1.3) 56(84) bytes of data.
64 bytes from 1.1.1.3: icmp_seq=1 ttl=64 time=0.026 ms
64 bytes from 1.1.1.3: icmp_seq=2 ttl=64 time=0.046 ms
64 bytes from 1.1.1.3: icmp_seq=3 ttl=64 time=0.034 ms

But the addresses of two pods on different nodes are not resolved:

root@kube-1:/home/vagrant/lab2# kathara exec b ping 1.1.1.1
PING 1.1.1.1 (1.1.1.1) 56(84) bytes of data.
From 1.1.1.2 icmp_seq=1 Destination Host Unreachable
From 1.1.1.2 icmp_seq=2 Destination Host Unreachable
From 1.1.1.2 icmp_seq=3 Destination Host Unreachable

Skazza94 commented 1 year ago

Hi @Asprofumo, thanks for opening the issue.

Seems like the underlaying network between the workers is not setup correctly.

1) Check if the two workers can ping each other. 2) Check if the two workers have a default route.

To do so run the following command on both workers (kube-2 and kube-3):

ip route

Example output: 671ab3a0-0695-4c0e-8081-f7e9fff5406e

If so, the default route should be in the same subnet of the workers network.

3) You can check is if iptables rules of Megalos collision domains are installed correctly on each worker. To do so, on a worker run:

sudo iptables -nvL

You should see some rows with the kt- bridge. You can paste the output of the command here.

If the output has the following string:

# Warning: iptables-legacy tables present, use iptables-legacy to see them

You have to switch iptables from nftables to the legacy one. The procedure depends on the Linux distro you have.

4) You can check if EVPN BGP peerings (needed by VXLAN) are up and running. Run the following command on the controller node kube-1:

kubectl -n kube-system get pods

The output should be something similar: e59408f1-3221-4133-a70b-9ba58198af56

Find the Pod name of the kube-kathara-master-XXX-YYY instance, for example kube-kathara-master-868d76bf57-2q5ml.

At this point run the following command:

kubectl exec <POD_NAME> -- vtysh -c 'sh bgp summary'

In our example:

kubectl exec -n kube-system kube-kathara-master-868d76bf57-2q5ml -- vtysh -c 'sh bgp summary'

The output should report that peerings are active (Up/Down column has a time) and there are some prefixes exchanged (MsgRcvd and MsgSent columns), like the example below: 150c1394-d599-4882-82ac-23c04197e12d

In case you still have problems, we can schedule a meeting to solve them.

Mariano.

lrossicone commented 1 year ago

Hi Mariano, thank you kindly for your reply! I have tried all the solutions you proposed, but unfortunately, I have not been able to solve it yet. however, your advice has given me some ideas, which I will experiment with asap.

for example, i noticed that the two workers manage to ping each other correctly, but for both of them, the default route is NOT on the subnet they use to communicate, so i will immediately try to change it:

vagrant@kube-1:~$ ip n
10.12.89.192 dev vxlan.calico lladdr 66:58:dc:96:ba:26 PERMANENT
10.12.79.128 dev vxlan.calico lladdr 66:ac:3c:4d:ec:a9 PERMANENT
...

the output of the command sudo iptables -nvL is very long, so I will avoid copying it here, but there was NO warning, so I think at least iptables is ok.

I also checked if EVPN BGP peerings are up and running, and it seems they are:

root@kube-1:/home/vagrant# kubectl exec -n kube-system kube-kathara-master-868d76bf57-46kvp -- vtysh -c 'sh bgp summary'

L2VPN EVPN Summary (VRF default):
BGP router identifier 10.12.89.212, local AS number 65000 vrf-id 0
BGP table version 0
RIB entries 3, using 576 bytes of memory
Peers 3, using 2170 KiB of memory
Peer groups 1, using 64 bytes of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
*10.0.2.15      4      65000       766       776        0    0    0 00:37:55            2        4 N/A
*10.11.1.71     4      65000       752       767        0    0    0 00:37:28            0        4 N/A
*10.11.1.72     4      65000       797       802        0    0    0 00:39:14            2        4 N/A

Total number of neighbors 3
* - dynamic neighbor
3 dynamic neighbor(s), limit 5000

I noticed from the output of your commands that you use Flannel as your default network (I insted, use Calico), so I was thinking that in addition to changing the default route, I could switch that as well.

I'll reply at the end of this thread as soon as I have news (hopefully positive), otherwise we'll have to schedule a meeting.

thanks again for your patience, see you soon!

Skazza94 commented 1 year ago

Hi @Asprofumo, the default route could be the reason of workers not communicating. While creating the VXLAN interface, you have to provide a master interface, and Megalos CNI always get the interface of the default route.

At this point, I also suggest to switch to Flannel and see what happens. I always used it and it always worked.

About the iptables, did you check if kt- rules are installed?

Mariano.

lrossicone commented 1 year ago

Changing the default routes finally got it working! we can close the issue.

Thank you very much for the support, I wish you good work!

PS: Yes, i already checked the kt- rules, and they are installed, I forgot to tell you.

KatharaFramework / Megalos-CNI

Pods on different nodes fail to communicate #3