Closed danehans closed 1 year ago
@rastislavs @YutaroHayakawa @harsimran-pabla PTAL when you have a moment and let me know if I am missing something that's causing this issue.
BGP CPlane doesn't import the route. So, you can't use it for establishing the Node-to-Node connectivity by meshing the Nodes. You can use auto-direct-node-routes
option to achieve the same goal (https://docs.cilium.io/en/stable/network/concepts/routing/#id3).
@YutaroHayakawa thanks for the feedback. From reading the BGP CP docs, I didn't realize this was expected behavior. I previously tested auto-direct-node-routes
and that worked as expected but requires L2 adjacency among nodes. According to the docs, kube-router should be used for native routing. Is this still the case with BGP CP? I'm trying to understand why different BGP solutions are used to establish native end-to-end connectivity among nodes that are not L2 adjacent.
@YutaroHayakawa Sorry, I'm also confused about this. I've just today set up a new cluster with the BGP control plane enabled. The BGP sessions are established and I can see the routes to my worker node pods (10.244.x.0/24) in my leaf router's routing table. But the worker nodes do not have L2 adjacency. So why isn't this configuration sufficient for node-to-node connectivity when native routing is enabled?
@danehans @dhess Am I missing something? If we don't have L2 reachability between nodes, then how can we reach the Pods in another node even if we exchange the route? Say when NodeA has PodCIDR 10.0.0.0/24 and NodeIP 192.168.0.1 and NodeB gets the route 10.0.0.0/24 via 192.168.0.1
. When the NodeB tries to reach the Pod on the NodeA with IP 10.0.0.1, it ARPs to 192.168.0.1, but 192.168.0.1 is out of the L2 domain, so ARP doesn't reach NodeA, so it cannot go anywhere.
@YutaroHayakawa Perhaps I'm the one missing something and there's something I don't get about Cilium or how Kubernetes networking works, but in my case, there is a BGP route reflector (https://networklessons.com/bgp/bgp-route-reflector) in my network whose leaves are individual Cilium nodes:
So NodeB in your scenario would get the route to NodeA's pods via the route reflector, not directly from NodeA.
In general, the route reflector doesn't modify the next-hop, so if the original route is advertised from NodeA, the next-hop is NodeA. So, it's effectively the same as receiving the route directly from NodeA. Unless all of your network devices in the same AS are connected to the same route reflector (or all of your nodes are in the same L2 domain), you can't make node-to-node connectivity (https://notes.networklessons.com/bgp-ibgp-split-horizon-rule).
@YutaroHayakawa I'm confused by your comments, and I wonder if we're talking past each other here. If so, my apologies.
In any case, I'm running eBGP. The route reflector is AS 65300, NodeA is AS 65201, and NodeB is AS 65202.
When I see the word route-reflector, it's all about iBGP. For the eBGP equivalent, I call it route-server. The NetworkLessons article you mentioned also says,
Route reflectors (RR) are one method to get rid of the full-mesh of IBGP peers in your network.
What's your actual network topology look like? I guess your issue is different from the original issue. The original issue is about meshing the nodes with BGP each other, but you seem to have a different topology.
Use Case: multiple clusters that share the same L2 segment. autoDirectNodeRoutes: true
can provide intra-cluster connectivity but not inter-cluster connectivity. If all nodes have a single network interface with a default route, the default gateway can resolve the destination pod IP to the appropriate node as a work around to this issue. However, inter-cluster traffic now relies on an external ARP resolver. Additionally, this issue still exists if cluster nodes have separate interfaces for external traffic (uses default route) and internal traffic (inter-cluster).
I'd recommend using ClusterMesh + autoDirectNodeRoutes
in that case, but I understand the point.
@YutaroHayakawa based on your above feedback and since https://github.com/cilium/cilium/pull/26195 merged, should this issue be closed?
Yep, thanks for your doc contribution!
Connectivity fails to establish between pods on different BGP CP nodes configured to advertise routes using
exportPodCIDR
. To reproduce:Create a kind cluster:
Install Cilium with BGP CP enabled:
Note: Im unsure what to use for
ipv4NativeRoutingCIDR: 10.0.0.0/8
. When I use only the pod network10.244.0.0/16
CoreDNS fails to start b/c it fails to connect to the kube-api service VIP.Verify the install:
Get the node IPs to set the BGP Router ID:
Annotate the nodes for BGP CP:
Apply the BGP peering policy with
exportPodCIDR
set:Verify the status of the BGP peers:
Run a test app (nginx) on each of the two worker nodes:
Get the IP's of the test app pods:
Test connectivity:
The logs indicate the destinations are created for the podCIDRs
Node routing tables are not updated with the BGP pod CIDRs: