Closed lrossicone closed 1 year ago
Hi @Asprofumo, thanks for opening the issue.
Seems like the underlaying network between the workers is not setup correctly.
1) Check if the two workers can ping each other. 2) Check if the two workers have a default route.
To do so run the following command on both workers (kube-2
and kube-3
):
ip route
Example output:
If so, the default route should be in the same subnet of the workers network.
3) You can check is if iptables
rules of Megalos collision domains are installed correctly on each worker. To do so, on a worker run:
sudo iptables -nvL
You should see some rows with the kt-
bridge. You can paste the output of the command here.
If the output has the following string:
# Warning: iptables-legacy tables present, use iptables-legacy to see them
You have to switch iptables from nftables
to the legacy one. The procedure depends on the Linux distro you have.
4) You can check if EVPN BGP peerings (needed by VXLAN) are up and running.
Run the following command on the controller node kube-1
:
kubectl -n kube-system get pods
The output should be something similar:
Find the Pod name of the kube-kathara-master-XXX-YYY
instance, for example kube-kathara-master-868d76bf57-2q5ml
.
At this point run the following command:
kubectl exec <POD_NAME> -- vtysh -c 'sh bgp summary'
In our example:
kubectl exec -n kube-system kube-kathara-master-868d76bf57-2q5ml -- vtysh -c 'sh bgp summary'
The output should report that peerings are active (Up/Down
column has a time) and there are some prefixes exchanged (MsgRcvd
and MsgSent
columns), like the example below:
In case you still have problems, we can schedule a meeting to solve them.
Mariano.
Hi Mariano, thank you kindly for your reply! I have tried all the solutions you proposed, but unfortunately, I have not been able to solve it yet. however, your advice has given me some ideas, which I will experiment with asap.
for example, i noticed that the two workers manage to ping each other correctly, but for both of them, the default route is NOT on the subnet they use to communicate, so i will immediately try to change it:
vagrant@kube-1:~$ ip n
10.12.89.192 dev vxlan.calico lladdr 66:58:dc:96:ba:26 PERMANENT
10.12.79.128 dev vxlan.calico lladdr 66:ac:3c:4d:ec:a9 PERMANENT
...
the output of the command sudo iptables -nvL
is very long, so I will avoid copying it here, but there was NO warning, so I think at least iptables is ok.
I also checked if EVPN BGP peerings are up and running, and it seems they are:
root@kube-1:/home/vagrant# kubectl exec -n kube-system kube-kathara-master-868d76bf57-46kvp -- vtysh -c 'sh bgp summary'
L2VPN EVPN Summary (VRF default):
BGP router identifier 10.12.89.212, local AS number 65000 vrf-id 0
BGP table version 0
RIB entries 3, using 576 bytes of memory
Peers 3, using 2170 KiB of memory
Peer groups 1, using 64 bytes of memory
Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc
*10.0.2.15 4 65000 766 776 0 0 0 00:37:55 2 4 N/A
*10.11.1.71 4 65000 752 767 0 0 0 00:37:28 0 4 N/A
*10.11.1.72 4 65000 797 802 0 0 0 00:39:14 2 4 N/A
Total number of neighbors 3
* - dynamic neighbor
3 dynamic neighbor(s), limit 5000
I noticed from the output of your commands that you use Flannel as your default network (I insted, use Calico), so I was thinking that in addition to changing the default route, I could switch that as well.
I'll reply at the end of this thread as soon as I have news (hopefully positive), otherwise we'll have to schedule a meeting.
thanks again for your patience, see you soon!
Hi @Asprofumo, the default route could be the reason of workers not communicating. While creating the VXLAN interface, you have to provide a master interface, and Megalos CNI always get the interface of the default route.
At this point, I also suggest to switch to Flannel and see what happens. I always used it and it always worked.
About the iptables
, did you check if kt-
rules are installed?
Mariano.
Changing the default routes finally got it working! we can close the issue.
Thank you very much for the support, I wish you good work!
PS: Yes, i already checked the kt-
rules, and they are installed, I forgot to tell you.
Hi, I am having some issues with Megalos: right now i have set up a cluster consisting of three nodes: kube-1 (master), kube-2, kube-3:
I am using Calico as the default network CNI plugin, and booting my labs goes smoothly; I can also execute commands on each of the various pods created by kathara without any problems. The only thing I cannot do is communicate between pods residing on different nodes. For example, by creating a lab consisting of 3 pods (a, b and c):
We can see how kathara distributes them on worker nodes:
The connection to each one is successful:
Communication between the two pods on the kube-2 node is successful:
But the addresses of two pods on different nodes are not resolved: