FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.21k stars 1.24k forks source link

BGP routes not being removed from route table #16641

Closed ghmj2417 closed 2 weeks ago

ghmj2417 commented 3 weeks ago

Description

Routes are not being removed from the route tables. I have witnessed this in three different scenarios.

Note: We are using service integrated-vtysh-config

Version

FRRouting 9.1.1 (host-10-27-206-25) on Linux(4.14.350-266.564.amzn2.x86_64).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--prefix=/opt/frrouting/sda/9.1' '--localstatedir=/run/frr' '--sysconfdir=/opt/frrouting/sda/9.1/etc' '--disable-ripd' '--disable-ripngd' '--disable-ospf6d' '--disable-ldpd' '--disable-nhrpd' '--disable-eigrpd' '--disable-babeld' '--disable-vrrpd' '--disable-pimd' '--disable-ospfapi' '--disable-ospfclient' '--disable-isisd' '--disable-fabricd' '--enable-watchfrr' '--enable-multipath=64' 'PYTHON=/usr/bin/python3.9' 'PKG_CONFIG=/opt/pkgconf/sda/2.0.3/bin/pkgconf' 'PKG_CONFIG_PATH=/opt/libyang/sda/2.1.128/lib64/pkgconfig:/opt/protobuf-c/sda/1.5.0/lib/pkgconfig' 'LIBS=-L/opt/python/3.9/lib -lpython3.9 -L/opt/json-c/sda/0.15/lib64 -ljson-c -L/opt/libyang/sda/2.1.128/lib64 -lyang -L/opt/protobuf-c/sda/1.5.0/lib -lprotobuf-c'

How to reproduce

I am using multiple BGP neighbors (I have not tested this with one) to inject routes into different route tables.

Summary of BGP config

route-map from-router-local permit 10
 set table 200
exit
!
route-map from-router-tunnel permit 10
 set table 201
exit
!
router bgp XYZ
neighbor X.X.X.1 route-map from-router-local in
neighbor X.X.X.1 bfd
neighbor Y.Y.Y.1 route-map from-router-tunnel in
neighbor Y.Y.Y.1 bfd
ip route show table 200
default via X.X.X.1 dev eth0 proto bgp metric 20
ip route show table 201
default via Y.Y.Y.1 dev tun0 proto bgp metric 20

My test involved me shutting down the BGP session from the remote device. In this case, neighbor X.X.X.1 was down.

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
X.X.X.1   4 ABC       889       890        0    0    0 00:01:39       Active        0 N/A
Y.Y.Y.1    4 ABC       906       908        6    0    0 01:15:03            1        2 N/A

BGP info

sh ip bgp
BGP table version is 6, local router ID is A.A.A.25, vrf id 0
Default local pref 100, local AS XYZ
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
 *> 0.0.0.0/0        Y.Y.Y.1              2002             0 ABC ABC ABC i
 *> A.A.A.25/32  0.0.0.0                  0         32768 ?

X.X.X.1 routes are being put into table 200.

sh ip route table 200
Codes: K - kernel route, C - connected, S - static, O - OSPF,
       B - BGP, T - Table, v - VNC, V - VNC-Direct, F - PBR,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF default table 200:
B>* 0.0.0.0/0 [20/0] via X.X.X.1, eth0, weight 1, 00:08:25
ip route show table 200
default via X.X.X.1 dev eth0 proto bgp metric 20

As you can see the route(s) still exist even though the neighbor is down. The routes from that neighbor are never removed.

Expected behavior

Routes are removed from their respective route tables.

Actual behavior

Routes stay in the route table.

Additional context

I found this issue from awhile ago, https://github.com/FRRouting/frr/issues/10390 and it is very similar to what I am seeing.

Please feel free to ask for more info or any debug output you would like to see.

Checklist

ton31337 commented 3 weeks ago

Please write the full configs, because I can't reproduce this issue.

ghmj2417 commented 2 weeks ago

Here it is

frr version 9.1.1
frr defaults traditional
hostname host-A-A-A-25
log stdout informational
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
router bgp XYZ
 bgp router-id A.A.A.25
 no bgp ebgp-requires-policy
 timers bgp 5 15
 neighbor X.X.X.1 remote-as ABC
 neighbor X.X.X.1 bfd
 neighbor Y.Y.Y.1 remote-as ABC
 neighbor Y.Y.Y.1 bfd
 !
 address-family ipv4 unicast
  redistribute kernel route-map redistribute-kernel
  neighbor X.X.X.1 soft-reconfiguration inbound
  neighbor X.X.X.1 route-map from-router-local in
  neighbor Y.Y.Y.1 soft-reconfiguration inbound
  neighbor Y.Y.Y.1 route-map from-router-tunnel in
 exit-address-family
exit
!
access-list redistribute-kernel seq 10 permit A.A.A.25/32
!
route-map from-router-local permit 10
 set table 200
exit
!
route-map redistribute-kernel permit 10
 match ip address redistribute-kernel
exit
!
route-map from-router-tunnel permit 10
 set table 201
exit
!
bfd
 peer X.X.X.1 interface eth0
  detect-multiplier 5
 exit
 !
 peer Y.Y.Y.1 interface tun0
  detect-multiplier 5
 exit
 !
exit
!
ton31337 commented 2 weeks ago

What am I doing wrong?

donatas.net# sh ip route table 200
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF default table 200:
B>  10.0.0.1/32 [20/0] via 127.0.0.3 (recursive), weight 1, 00:00:13
  *                      via 192.168.10.1, enp3s0, weight 1, 00:00:13
donatas.net# con
donatas.net(config)# router bgp 
donatas.net(config-router)# neighbor 127.0.0.3 shutdown 
donatas.net(config-router)# do sh ip route table 200
donatas.net(config-router)# no neighbor 127.0.0.3 shutdown 
donatas.net(config-router)# do sh ip route table 200
Codes: K - kernel route, C - connected, L - local, S - static,
       R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, D - SHARP,
       F - PBR, f - OpenFabric, t - Table-Direct,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF default table 200:
B>  10.0.0.1/32 [20/0] via 127.0.0.3 (recursive), weight 1, 00:00:03
  *                      via 192.168.10.1, enp3s0, weight 1, 00:00:03
donatas.net(config-router)# 

Config is:

...
  neighbor 127.0.0.3 remote-as external
  neighbor 127.0.0.3 soft-reconfiguration inbound
  neighbor 127.0.0.3 route-map exa in
...
route-map exa permit 10
 set table 200
exit
ghmj2417 commented 2 weeks ago

Maybe it has to do with our config using two route tables 200 and 201?

ton31337 commented 2 weeks ago

If you are about 0.0.0.0/0 to be announced by both peers and 0.0.0.0/0 installed into different tables by FRR, then this won't work with the current design on how BGP/Zebra operates. I suggest using VRFs instead.

ghmj2417 commented 2 weeks ago

Is this issue with the routes being the same? What if route table 200 has 0.0.0.0/0, and table 201 gets routes 0.0.0.0/1 and 128.0.0.0/1? This is just a curiosity, doesn't mean I would do it. I would probably switch to using VRFs if my kernel will support it.

ton31337 commented 2 weeks ago

If the mask is different (== different route), then it might work fine.

ghmj2417 commented 2 weeks ago

Thank you for looking into this. Much appreciated.