What happened?
In our environment, we have Cisco 9Ks as BGP RRs. While the gobgp RIB had the correct nexthop, the logs were filled with:
E0211 02:50:17.602793 1 network_routes_controller.go:418] Failed to inject routes due to: could not parse next hop received from GoBGP for path: nlri:<type_url:"type.googleapis.com/gobgpapi.IPAddressPrefix" value:"\010\031\022\01410.239.2.128" > pattrs:<type_url:"type.googleapis.com/gobgpapi.OriginAttribute" > pattrs:<type_url:"type.googleapis.com/gobgpapi.AsPathAttribute" > pattrs:<type_url:"type.googleapis.com/gobgpapi.LocalPrefAttribute" value:"\010d" > pattrs:<type_url:"type.googleapis.com/gobgpapi.OriginatorIdAttribute" value:"\n\r10.228.114.72" > pattrs:<type_url:"type.googleapis.com/gobgpapi.ClusterListAttribute" value:"\n\01310.228.0.10\n\r169.254.25.25" > pattrs:<type_url:"type.googleapis.com/gobgpapi.MpReachNLRIAttribute" value:"\n\004\010\001\020\001\022\r10.228.114.72\032@\n,type.googleapis.com/gobgpapi.IPAddressPrefix\022\020\010\031\022\01410.239.2.128" > age:<seconds:1613011817 > validation:<> family:<afi:AFI_IP safi:SAFI_UNICAST > source_asn:65526 source_id:"10.228.0.10" neighbor_ip:"10.228.205.194" local_identifier:1
while the kernel routing table did not have the routes.
The Cisco router put the next hop in the multiprotocol reachability attribute (MP_REACH_NLRI) instead of the next hop attribute. (NEXT_HOP)
System Information (please complete the following information):
Kube-Router Version (kube-router --version): 1.1.1
Kube-Router Parameters: Running with Cilium 1.9.3, so kube-router is only doing routing:
--run-router=true
--run-firewall=false
--run-service-proxy=false
--enable-cni=false
--enable-overlay=false
--nodes-full-mesh=false
--cluster-asn=
--enable-ibgp=false
--advertise-cluster-ip=true
--advertise-external-ip=false
--advertise-loadbalancer-ip=false
Kubernetes Version (kubectl version) : 1.20.2
Cloud Type: on prem
Kubernetes Deployment Type: Kubeadm
Kube-Router Deployment Type: DaemonSet
Cluster Size: 6
Additional context
I built a custom kube-router and added the following code to node_controller.go, which fixed the problem our environment, but discussion should ensue on how to handle the multiple possible next hops this could present.
diff --git a/pkg/controllers/routing/network_routes_controller.go b/pkg/controllers/routing/network_routes_controller.go
index 4f05c32d..12ef8912 100644
--- a/pkg/controllers/routing/network_routes_controller.go
+++ b/pkg/controllers/routing/network_routes_controller.go
@@ -515,6 +515,14 @@ out:
}
}
break out
+ case *gobgpapi.MpReachNLRIAttribute:
+ nextHop = net.ParseIP(a.NextHops[0]).To4()
+ if nextHop == nil {
+ if nextHop = net.ParseIP(a.NextHops[0]).To16(); nextHop == nil {
+ return fmt.Errorf("invalid nextHop address: %s", a.NextHops[0])
+ }
+ }
+ break out
}
}
if nextHop == nil {
What happened? In our environment, we have Cisco 9Ks as BGP RRs. While the gobgp RIB had the correct nexthop, the logs were filled with:
while the kernel routing table did not have the routes.
The Cisco router put the next hop in the multiprotocol reachability attribute (MP_REACH_NLRI) instead of the next hop attribute. (NEXT_HOP)
System Information (please complete the following information):
kube-router --version
): 1.1.1kubectl version
) : 1.20.2Additional context
I built a custom kube-router and added the following code to node_controller.go, which fixed the problem our environment, but discussion should ensue on how to handle the multiple possible next hops this could present.