missing /32 FIB entries

contiv / vpp

Kubernetes CNI plugin based on FD.io VPP

https://contivpp.io

Apache License 2.0

246 stars 114 forks source link

missing /32 FIB entries #1614

Open gilesheron opened 5 years ago

gilesheron commented 5 years ago

It seems like Contiv-VPP adds a /32 FIB entry in VRF1 on a new node for each existing node in the cluster (by default those are in 192.168.30.0/24).

But it also seems that the existing nodes don't get updated. So, for example, the master node in my cluster only has the /24 plus a local /32, the first worker I start also has a /32 for the master, and the next worker gets that plus a /32 for the first worker - and so on.

So I'm seeing pings drop when destined for 192.168.30.0/24 addresses when there's no matching /32, but stuff behind those address (e.g. pod IPs) seems to be ok (looks like the FIB entry for the /24 resolves to ARP whereas the FIB entry for the IPAM network resolves to the correct next-hop including MAC addresses etc.)

rastislavs commented 5 years ago

Hey Giles, we are talking about the vxlanCIDR and the IP addresses applied to the VXLAN BVI interfaces, right?

Contiv-VPP is actually NOT installing any static /32 routes for them. They seem to be installed by VPP itself. I guess the local interface is always there and the rest is installed whenever there is some pod-to-pod communication between particular nodes (since there is almost always something talking to the master node, /32 for the master BVI would be installed on each node).

From where are you trying to ping the remote BVI interfaces? From VPP? Do you use some specific source interface? Apart from ping not working, do you see any issue with communication between the nodes? You know that ping utility on VPP has many issues...

gilesheron commented 5 years ago

interesting. but not sure why e.g. worker2 gets a route for worker1.

and yeah - was trying to ping from VPP using the loop0 as the source.

Thee reason I started looking at this was we had a broken cluster (workers unable to reach etcd IIRC) and that was the only thing I could see different in the FIBs.

will dig some more...

gilesheron commented 5 years ago

oh yes, so it was the vxlanCIDR addresses.

rastislavs commented 5 years ago

Well if the issue was that workers were unable to reach contiv ETCD, you may need to look at a different place. Contiv-ETCD uses nodeport service, so relies on kube-proxy to do the NAT, then the traffic goes via mgmt node inter-connection as opposed to via VPP (to the master node's mgmt IP from the workers). The reason for that is that the agents need to be able to connect to ETCD even before the CNI starts working (before VPP is running & configured).

Maybe you are hitting this issue? https://github.com/contiv/vpp/issues/1430

rastislavs commented 5 years ago

BTW I am confused about why, when and how VPP installs those /32 routes for other node's BVIs as well. It should not need them at all - there is a /24 route covering them, but I guess it is some runtime optimization in the VPP fib logic? Anyway, I tried to "force" VPP to create them using some pod-to-pod traffic, but wasn't successful. Pod-to-pod traffic between the nodes worked, but the /32 route towards the other node's BVI was not added.