Closed camrossi closed 2 years ago
Hmm... I can't seem to reproduce this locally, either with Juniper equipment or via FRR. Both the first time, and whenever I restart kube-router, I see the routes come in immediately without waiting for the graceful-restart time.
I'm going to assume that you have node annotations on your nodes describing the peer.ips and peer.asns? Because otherwise, I can't get your config to work at all because kube-router never even tries to establish a peering session. I had to execute the following commands:
kubectl annotate node kube-router-vm2 "kube-router.io/peer.ips=10.241.0.10"
kubectl annotate node kube-router-vm2 "kube-router.io/peer.asns=65004"
Once I did that, I had to execute a rollout restart so that the node annotations took effect, unfortunately, kube-router doesn't watch nodes to catch these changes live yet.
Maybe one other thing I can think of, is can you show me your graceful-restart
settings from gobgp
within the kube-router container? To do this, you should be able to do something like:
% kubectl exec -ti -n kube-system kube-router-2prwb -- /bin/bash
...
#gobgp n
Peer AS Up/Down State |#Received Accepted
10.241.0.10 65004 00:00:42 Establ | 0 0
#gobgp n 10.241.0.10
BGP neighbor is 10.241.0.10, remote AS 65004
BGP version 4, remote router ID 10.241.0.10
BGP state = ESTABLISHED, up for 00:00:45
...
Neighbor capabilities:
multiprotocol:
ipv4-unicast: advertised and received
ipv6-unicast: advertised and received
route-refresh: advertised and received
extended-nexthop: advertised
Local: nlri: ipv4-unicast, nexthop: ipv6
graceful-restart: advertised and received
Local: restart time 90 sec
ipv4-unicast
ipv6-unicast
Remote: restart time 300 sec
ipv4-unicast, forward flag set
ipv6-unicast, forward flag set
4-octet-as: advertised and received
...
The key to the above is that for graceful restart you should see basically one for one all of those values. If you are missing any of them, then likely your remote is incorrectly setup in some way.
My configs:
#kube-router -V
Running kube-router version v1.5.0, built on 2022-05-30T17:32:19+0000, go1.17.10
kube-router arguments:
Args:
--run-router=true
--run-firewall=true
--run-service-proxy=true
--bgp-graceful-restart=true
--kubeconfig=/var/lib/kube-router/kubeconfig
--runtime-endpoint=unix:///run/containerd/containerd.sock
--cluster-asn=65003
--advertise-external-ip
--advertise-loadbalancer-ip
--advertise-pod-cidr=true
--enable-ibgp=false
--enable-overlay=false
--enable-pod-egress=false
--override-nexthop=true
--service-external-ip-range=10.243.0.0/24
From FRR host:
# vtysh -c "show bgp detail"
BGP table version is 9, local router ID is 10.241.0.10, vrf id 0
Default local pref 100, local AS 65004
Status codes: s suppressed, d damped, h history, * valid, > best, = multipath,
i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes: i - IGP, e - EGP, ? - incomplete
Network Next Hop Metric LocPrf Weight Path
*> 10.242.0.0/24 10.241.0.20 10 0 65003 i
*> 10.242.1.0/24 10.241.0.21 10 0 65003 i
*> 10.243.0.1/32 10.241.0.21 10 0 65003 i
Displayed 3 routes and 3 total paths
# ip route
default via 10.241.0.1 dev ens3 proto dhcp src 10.241.0.10 metric 100
10.241.0.0/16 dev ens3 proto kernel scope link src 10.241.0.10
10.241.0.1 dev ens3 proto dhcp scope link src 10.241.0.10 metric 100
10.242.0.0/24 via 10.241.0.20 dev ens3 proto bgp metric 20
10.242.1.0/24 via 10.241.0.21 dev ens3 proto bgp metric 20
10.243.0.1 via 10.241.0.21 dev ens3 proto bgp metric 20
FRR Config:
# cat /etc/frr/frr.conf
# default to using syslog. /etc/rsyslog.d/45-frr.conf places the log
# in /var/log/frr/frr.log
# In FRR both ! and # are considered comment characters and can be treated the same
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! Base Config for FRR as a whole
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# Reflects defaults adhering mostly to IETF standards or common practices in wide-area internet routing
# (as opposed to datacenter which reflects a single administrative domain and uses aggressive timers)
frr defaults traditional
!
# Logs to syslog at an informational level
# (other values are: emergencies, alerts, critical, errors, warnings, notifications, informational, or debugging)
log syslog informational
!
# Puts all configuration into this single frr.conf file rather than having a separate config per daemon
service integrated-vtysh-config
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! Basic BGP config to setup neighbors and peer groups
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
router bgp 65004
# ID ourselves as our default IPv4 Address
bgp router-id 10.241.0.10
!
# Consider paths of equal AS_PATH length candidates for multipath computation (without this, the entire AS_PATH must
# match for multipath computation
bgp bestpath as-path multipath-relax
# Ensure that when comparing routes where both are equal on most metrics, that the tie is broken based on router ID
bgp bestpath compare-routerid
!
# Enable BGP Graceful Restart
bgp graceful-restart
bgp graceful-restart preserve-fw-state
bgp graceful-restart restart-time 300
!
# Setup peer groups
neighbor kubepeers peer-group
neighbor kubepeers remote-as 65003
!
# Add peers
neighbor 10.241.0.20 peer-group kubepeers
neighbor 10.241.0.21 peer-group kubepeers
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! Configure IPv4 family
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
address-family ipv4 unicast
# Activate ipv4 for the kubepeers peer groups
neighbor kubepeers activate
!
# Setup this configuration as a route-server, see:
# https://docs.frrouting.org/en/latest/bgp.html#configuring-frr-as-a-route-server
neighbor kubepeers route-server-client
!
# Filter imports & exports via route-map first
neighbor kubepeers route-map IMPORTv4 in
neighbor kubepeers route-map UNACCEPTED out
!
# "import" and "export" are different than the normal "in" and "out" definitions that we normally see in policy
# This is tied to route-server-client definition above
neighbor kubepeers route-map IMPORTv4 import
neighbor kubepeers route-map UNACCEPTED export
!
# Allows us to generate inbound updates from a neighbor, change and activate BGP policies without clearing the BGP session
neighbor kubepeers soft-reconfiguration inbound
exit-address-family
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! Configure IPv6 family
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
address-family ipv6 unicast
# Activate ipv6 for the kubepeers peer groups
neighbor kubepeers activate
!
# Setup this configuration as a route-server, see:
# https://docs.frrouting.org/en/latest/bgp.html#configuring-frr-as-a-route-server
neighbor kubepeers route-server-client
!
# Filter imports & exports via route-map first
neighbor kubepeers route-map IMPORTv6 in
neighbor kubepeers route-map UNACCEPTED out
!
# "import" and "export" are different than the normal "in" and "out" definitions that we normally see in policy
# This is tied to route-server-client definition above
neighbor kubepeers route-map IMPORTv6 import
neighbor kubepeers route-map UNACCEPTED export
!
# Allows us to generate inbound updates from a neighbor, change and activate BGP policies without clearing the BGP session
neighbor kubepeers soft-reconfiguration inbound
exit-address-family
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! Setup IP Prefix lists
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# Allow external IP range and allows /32 addresses to be specified
ip prefix-list pl-allowed-adv seq 5 permit 10.243.0.0/24 le 32
# Allow pod IP addresses and allows /24 addresses to be specified (which is the default from kube-controller-manager)
ip prefix-list pl-allowed-adv seq 10 permit 10.242.0.0/16 le 24
# Allow Cluster IP Addresses (from Kubernetes default range) and allows /32 addresses to be specified
# This is disabled for now, but in order for this to work, kube-router would need to be configured with: --advertise-cluster-ip
# ip prefix-list pl-allowed-adv seq 15 permit 10.96.0.0/12 le 32
# Deny all other BGP imports
ip prefix-list pl-allowed-adv seq 50 deny any
!
# Not exactly sure how to configure this just yet, but this is a rough attempt for IPv6 testing
ipv6 prefix-lists pl-allowed-v6-adv seq 5 permit 2001:0DB8:0000::/48 le 64
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
! Setup Route Maps
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
# Allows us to filter imports from the prefix-list
route-map IMPORTv4 permit 10
match ip address prefix-list pl-allowed-adv
set metric 10
!
route-map IMPORTv6 permit 10
match ipv6 address prefix-list pl-allowed-v6-adv
set metric 10
!
# Deny any export paths
route-map UNACCEPTED deny 1
Thank you for the detailed reply @aauren!
Yes my nodes are annotated correctly:
kube-router.io/peer.asns: 65002,65002
kube-router.io/peer.ips: 192.168.12.203,192.168.12.204
kube-router.io/peer.passwords: MTIzQ2lzY28xMjM=,MTIzQ2lzY28xMjM=
I managed to recreate the issue with GoBGP, it's due to IPv6 being enabled on GoBGP but not on my routers. This https://github.com/osrg/gobgp/issues/2524
For example with this config all works perfectly fine, see that under multiprotocol I only have ipv4-unicast:
BGP neighbor is 192.168.12.201, remote AS 65002
BGP version 4, remote router ID 1.1.1.1
BGP state = ESTABLISHED, up for 00:00:04
BGP OutQ = 0, Flops = 0
Hold time is 3, keepalive interval is 1 seconds
Configured hold time is 90, keepalive interval is 30 seconds
Neighbor capabilities:
multiprotocol:
ipv4-unicast: advertised and received
route-refresh: advertised and received
extended-nexthop: advertised and received
Local: nlri: ipv4-unicast, nexthop: ipv6
Remote: nlri: ipv4-unicast, nexthop: ipv6
graceful-restart: advertised and received
Local: restart time 120 sec
ipv4-unicast
4-octet-as: advertised and received
UnknownCapability(66): received
UnknownCapability(67): received
fqdn: advertised
Local:
name: nkt-k8s-node, domain:
cisco-route-refresh: received
Message statistics:
Sent Rcvd
Opens: 1 1
Notifications: 0 0
Updates: 1 201
Keepalives: 5 6
Route Refresh: 0 0
Discarded: 0 0
Total: 7 208
Route statistics:
Advertised: 1
Received: 200
Accepted: 200
Restarting GoBGP has no delay in advertising the routes but the moment I configure GoBGP to do v6 as well then the issue happens:
BGP neighbor is 192.168.12.201, remote AS 65002
BGP version 4, remote router ID 1.1.1.1
BGP state = ESTABLISHED, up for 00:00:07
BGP OutQ = 0, Flops = 0
Hold time is 3, keepalive interval is 1 seconds
Configured hold time is 90, keepalive interval is 30 seconds
Neighbor capabilities:
multiprotocol:
ipv4-unicast: advertised and received
ipv6-unicast: advertised <===== this is the issue
I have not configured v6 on my switches and my K8s nodes are v4 only as well so why kube-router enables v6 ?
I tested by deleting the the AfiSafiConfig
for the Family_AFI_IP6
here and here
Now when kube-rotuer comes up there is no more ipv6-unicast
in the multiprotocol section and GR works just fine.
I do not think is a misconfiguration on my side, I don't think not configuring IPv6 on my rotuers is an issue, kube-router should either not wait for the IPv6 MP_UNREACH_NLRI
message (but this seems to be a gobgp issue) or just not configure IPv6 in the first place. Perhaps adding an --enable-ipv6
options would be an idea?
@camrossi I think that I agree with you. At least as the Network Routes Controller (NRC) is currently written it is mean to work with IPv4 or IPv6 exclusively. As such there shouldn't be any use-case where both IPv6 and IPv4 peers should be set at the same time. There is already a semantic for checking this in the NRC code via the variable nrc.isIpv6()
so I created #1327 to address this issue.
Thank you, I will test the fix today!
Just tested from your fork and it works perfectly !
What happened? Configure kube-rotuer to peer via eBGP to external switches. It takes the configured bgp-graceful-restart-deferral-time before any routes are advertised to the peering switches. This happens for new installation (where the adjacencies is coming up for the first time) or when restarting the kube-router pods during node maintenance.
What did you expect to happen? Routes should be advertised as soon as the BGP session is established. I tested with "pure" gobgp 3.3 and this is working as expected.
How can we reproduce the behavior you experienced?
Configure kube-router with:
Disabling GR on kube-router results in the routes to be advertised immediately.
Screenshots / Architecture Diagrams / Network Topologies
I checked with a network trace and I can see the following (this is restarting with GR):
System Information (please complete the following information):
kube-router --version
): Running kube-router version v1.5.0-8-g88266bc2, built on 2022-06-20T16:16:31+1000, go1.17.10kubectl version
) : v1.23.4Additional context I tested with gobgp 3.3.0 with this config (connecting to the same switches and to the same BGP process) and there the routes are advertised immediately for new gobgp process or during GR