FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.18k stars 1.22k forks source link

BGP convergence between state and current BGP nexthop status #7200

Open pguibert6WIND opened 3 years ago

pguibert6WIND commented 3 years ago

This ticket is a question about BGP behavior

I have a e-bgp session configured like below.

router bgp 65000 vrf nhrp
 bgp router-id 10.255.255.1
 bgp disable-ebgp-connected-route-check
 neighbor 10.255.255.4 remote-as 65001
!

My concern is that the 10.255.255.4 ip address may not be available all the time. ( this is a route inherited by NHRP, and I as you can see with below dumps, the nexthop tracking perfectly reflects the availability of this IP.

spoke1-vm# show ip nht vrf nhrp
10.255.255.4
 unresolved
 Client list: bgp(fd 35)
11.11.11.1
 resolved via connected
 is directly connected, wan
 Client list: static(fd 25)
spoke1-vm# show bgp vrf nhrp nexthop
Current BGP nexthop cache:
 10.255.255.4 invalid, peer 10.255.255.4
  Last update: Tue Sep 29 14:09:13 2020

My concern is about BGP session. The route is not available, though BGP status is established in dumps

spoke1-vm# show bgp vrf nhrp neighbors
BGP neighbor is 10.255.255.4, remote AS 65001, local AS 65000, external link
Hostname: hub-vm
  BGP version 4, remote router ID 10.255.255.4, local router ID 10.255.255.1
  BGP state = Established, up for 00:03:23
spoke1-vm# show bgp vrf nhrp ipv4
BGP table version is 5, local router ID is 10.255.255.1, vrf id 1
Default local pref 100, local AS 65000
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete

   Network          Next Hop            Metric LocPrf Weight Path
   10.255.255.0/24  10.255.255.4             0             0 65001 i
   192.168.0.0/16   10.255.255.4             0             0 65001 i
*> 192.168.1.0/24   0.0.0.0                  0         32768 i

So my question is. Is it possible to consider that patching BGP so that the status of BGP session reflects the availability of the nexthop is ok ? As it changes the BGP behavior, I want to ask community: is it acceptable to set BGP to down when nexthop peer is not available ?

ton31337 commented 3 years ago

Is it possible to consider that patching BGP so that the status of BGP session reflects the availability of the nexthop is ok ?

This sounds valid and useful for me.

As it changes the BGP behavior, I want to ask community: is it acceptable to set BGP to down when nexthop peer is not available ?

Nexthop could be set per prefix (faked - inaccessible) and could be overridden with route-maps, hence that could lead to down all the prefixes (including good) for a decent peer.

An additional knob for bgp disable-ebgp-connected-route-check [shutdown] or so could be added though to force this behavior as you described.

pguibert6WIND commented 3 years ago

I made some proposal in atached pull request, assuming the problem could be with either ibgp or ebgp. whatever, we talk about connected routes here. so I forced with ttl-security command.

I fell on a strange behavior on 7.3, where after forcing BGP to go down, and go up ( after reestablishing route to GRE interface), I fell on continuous error message. I had to configure remote as passive in order to avoid this to happen.

BGP neighbor is 10.255.255.3, remote AS 65001, local AS 65001, internal link
Hostname: ubuntu1604es
  BGP version 4, remote router ID 0.0.0.0, local router ID 192.168.254.2
  BGP state = Active
  Last read 00:07:12, Last write 00:01:04
  Hold time is 180, keepalive interval is 60 seconds
  Message statistics:
    Inq depth is 0
    Outq depth is 0
                         Sent       Rcvd
    Opens:                 10          9
    Notifications:         14          2
    Updates:                2          2
    Keepalives:             2          1
    Route Refresh:          0          0
    Capability:             0          0
    Total:                 28         14
  Minimum time between advertisement runs is 0 seconds

 For address family: IPv4 Unicast
  Not part of any update group
  Community attribute sent to this neighbor(all)
  0 accepted prefixes

  Connections established 1; dropped 1
  Last reset 00:07:12,   Notification sent (Cease/Connection collision resolution)
  Message received that caused BGP to send a NOTIFICATION:
    FFFFFFFF FFFFFFFF FFFFFFFF FFFFFFFF
    00550104 FDE900B4 C0A8FF02 38020601
    04000100 01020280 00020202 00020641
    040000FD E9020645 04000101 01021049
    0E0C7562 756E7475 31363034 65730002
    04400200 78
  Internal BGP neighbor may be up to 1 hops away.
Local host: 10.255.255.1, Local port: 50036
Foreign host: 10.255.255.3, Foreign port: 179
Nexthop: 10.255.255.1
Nexthop global: ::
Nexthop local: ::
BGP connection: non shared network
BGP Connect Retry Timer in Seconds: 120
Next connect timer due in 56 seconds
Read thread: off  Write thread: off  FD used: -1