FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.37k stars 1.26k forks source link

BGP fails to determine nexthop routed directly to interface #15535

Open vgrebenschikov opened 8 months ago

vgrebenschikov commented 8 months ago

Description

BGP route shown as "no best path" and next-hop shown as (inaccessible):

srv# show ip bgp 172.22.9.0/24
BGP routing table entry for 172.22.9.0/24, version 0
Paths: (1 available, no best path)
  Not advertised to any peer
  65022
    172.22.4.253 (inaccessible) from 172.22.4.253 (172.22.9.1)
      Origin IGP, invalid, external
      Last update: Tue Mar 12 15:52:46 2024

while route is fine (Kernel), but it directly points to interface (point-to-point):

srv# show ip route 172.22.4.253/32
Routing entry for 172.22.4.253/32
  Known via "kernel", distance 0, metric 0, best
  Last update 00:02:37 ago
  * directly connected, wg0

or in system:

# route -n get 172.22.4.253/32
   route to: 172.22.4.253
destination: 172.22.4.253
        fib: 0
  interface: wg0
      flags: <UP,HOST,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1420         1         0

adding the same static route does not change anything:

srv# conf t
srv(config)# ip route 172.22.4.253/32 wg0
srv(config)# 

srv# show ip bgp 172.22.9.0/24
BGP routing table entry for 172.22.9.0/24, version 0
Paths: (1 available, no best path)
  Not advertised to any peer
  65022
    172.22.4.253 (inaccessible) from 172.22.4.253 (172.22.9.1)
      Origin IGP, invalid, external
      Last update: Tue Mar 12 15:55:03 2024

But, if I assign whole subnet to interface:

# ifconfig wg0 172.22.4.192/26

Everything works as expected:

srv# show ip bgp 172.22.9.0/24
BGP routing table entry for 172.22.9.0/24, version 7
Paths: (1 available, best #1, table default)
  Advertised to non peer-group peers:
  172.22.4.253
  65022
    172.22.4.253 (metric 1) from 172.22.4.253 (172.22.9.1)
      Origin IGP, valid, external, best (First path received)
      Last update: Tue Mar 12 15:55:04 2024

Looks like there is a problem in next-hop availability algorythm.

FRRouting 8.5.4 (srv) on FreeBSD(14.0-RELEASE).

Version

# show version
FRRouting 8.5.4 (srv) on FreeBSD(14.0-RELEASE).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-vtysh' '--disable-doc-html' '--sysconfdir=/usr/local/etc/frr' '--localstatedir=/var/run/frr' '--disable-nhrpd' '--disable-pathd' '--disable-ospfclient' '--disable-pimd' '--disable-pbrd' '--with-vtysh-pager=cat' '--enable-backtrace' '--disable-config-rollbacks' '--disable-datacenter' '--enable-fpm' '--disable-ldpd' '--without-libpam' '--enable-rpki' '--disable-sharpd' '--disable-shell-access' '--disable-snmp' '--disable-tcmalloc' '--prefix=/usr/local' '--mandir=/usr/local/man' '--disable-silent-rules' '--infodir=/usr/local/share/info/' '--build=amd64-portbld-freebsd14.0' 'build_alias=amd64-portbld-freebsd14.0' 'PKG_CONFIG=pkgconf' 'PKG_CONFIG_LIBDIR=/wrkdirs/usr/ports/net/frr8/work/.pkgconfig:/usr/local/libdata/pkgconfig:/usr/local/share/pkgconfig:/usr/libdata/pkgconfig' 'CC=cc' 'CFLAGS=-O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 'LDFLAGS= -L/usr/local/lib -L/usr/local/lib -fstack-protector-strong ' 'LIBS=' 'CPPFLAGS=-I/usr/local/include -I/usr/local/include' 'CPP=cpp' 'CXX=c++' 'CXXFLAGS=-O2 -pipe -fstack-protector-strong -fno-strict-aliasing ' 'PYTHON=/usr/local/bin/python3.9'

How to reproduce

use any point-to-point interface without sub-net (i.e. wg0) to make BGP session

Expected behavior

Valid direct route to interface should be accounted for installing BGP routes to the interface
as:

srv# show ip bgp 172.22.9.0/24
BGP routing table entry for 172.22.9.0/24, version 7
Paths: (1 available, best #1, table default)
  Advertised to non peer-group peers:
  172.22.4.253
  65022
    172.22.4.253 (metric 1) from 172.22.4.253 (172.22.9.1)
      Origin IGP, valid, external, best (First path received)
      Last update: Tue Mar 12 15:55:04 2024

Actual behavior

BGP route is not installed:

srv# show ip bgp 172.22.9.0/24
BGP routing table entry for 172.22.9.0/24, version 0
Paths: (1 available, no best path)
  Not advertised to any peer
  65022
    172.22.4.253 (inaccessible) from 172.22.4.253 (172.22.9.1)
      Origin IGP, invalid, external
      Last update: Tue Mar 12 15:52:46 2024

Additional context

No response

Checklist

ton31337 commented 8 months ago

Could you try enabling ip nht resolve-via-default?

vgrebenschikov commented 8 months ago

Could you try enabling ip nht resolve-via-default

yep, the same:

srv# conf t
srv(config)# ip nht resolve-via-default
srv(config)#
srv#
srv# clear ip bgp *

srv# show ip bgp neighbors 172.22.4.253 received
BGP table version is 3, local router ID is 172.22.1.5, vrf id 0
Default local pref 100, local AS 65021
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

    Network          Next Hop            Metric LocPrf Weight Path
 *> 172.22.0.0/19    172.22.4.253                           0 65022 65021 i
 *> 172.22.9.0/24    172.22.4.253                           0 65022 i
 *> 172.22.19.0/24   172.22.4.253                           0 65022 65023 i
 *> 172.22.20.0/24   172.22.4.253                           0 65022 i
 *> 172.22.21.0/24   172.22.4.253                           0 65022 i
 *> 172.22.24.0/24   172.22.4.253                           0 65022 i
 *> 172.23.0.0/16    172.22.4.253                           0 65022 65021 i
 *> 172.24.1.0/24    172.22.4.253                           0 65022 65021 i

Total number of prefixes 8

srv# show ip bgp 172.22.9.0/24
BGP routing table entry for 172.22.9.0/24, version 0
Paths: (1 available, no best path)
  Not advertised to any peer
  65022
    172.22.4.253 (inaccessible) from 172.22.4.253 (172.22.9.1)
      Origin IGP, invalid, external
      Last update: Wed Mar 13 09:34:02 2024
srv#
ton31337 commented 8 months ago

Could you show show ip nht? And also try enabling debug debug bgp nht.

vgrebenschikov commented 8 months ago
srv# show ip nht
172.22.2.1
 resolved via connected
 is directly connected, re0 (vrf default)
 Client list: static(fd 27)
172.22.4.253(Connected)
 unresolved(Connected)
 Client list: bgp(fd 32)

and bgpd.log:

2024/03/13 13:31:23 BGP: [WNKP5-SN018] Found existing bnc 172.22.4.253/32(0)(VRF default) flags 0xe ifindex 0 #paths 0 peer 0x28da947a4580
2024/03/13 13:31:23 BGP: [WNKP5-SN018] Found existing bnc 172.22.4.253/32(0)(VRF default) flags 0xe ifindex 0 #paths 0 peer 0x28da947a4580
2024/03/13 13:31:23 BGP: [WNKP5-SN018] Found existing bnc 172.22.4.253/32(0)(VRF default) flags 0xe ifindex 0 #paths 0 peer 0x28da947a4580
2024/03/13 13:31:23 BGP: [VKMV1-4Y773] bgp_update(172.22.4.253): NH unresolved
2024/03/13 13:31:23 BGP: [WNKP5-SN018] Found existing bnc 172.22.4.253/32(0)(VRF default) flags 0xe ifindex 0 #paths 1 peer 0x28da947a4580
2024/03/13 13:31:23 BGP: [VKMV1-4Y773] bgp_update(172.22.4.253): NH unresolved
2024/03/13 13:31:23 BGP: [WNKP5-SN018] Found existing bnc 172.22.4.253/32(0)(VRF default) flags 0xe ifindex 0 #paths 2 peer 0x28da947a4580
2024/03/13 13:31:23 BGP: [VKMV1-4Y773] bgp_update(172.22.4.253): NH unresolved
2024/03/13 13:31:23 BGP: [WNKP5-SN018] Found existing bnc 172.22.4.253/32(0)(VRF default) flags 0xe ifindex 0 #paths 3 peer 0x28da947a4580
2024/03/13 13:31:23 BGP: [VKMV1-4Y773] bgp_update(172.22.4.253): NH unresolved
2024/03/13 13:31:23 BGP: [WNKP5-SN018] Found existing bnc 172.22.4.253/32(0)(VRF default) flags 0xe ifindex 0 #paths 4 peer 0x28da947a4580
2024/03/13 13:31:23 BGP: [VKMV1-4Y773] bgp_update(172.22.4.253): NH unresolved

anyway, default is wrong direction for that route, it should point to wg0

ton31337 commented 7 months ago

@vgrebenschikov can we get the following outputs also:

show bgp nexthop
show bgp import-check-table
show ip import-check
beith12 commented 7 months ago

@vgrebenschikov can you post the wireguard config (minus any keys of course) ?

tera2603 commented 4 months ago

can you please provide the configuration topology for recreation of this bug.

nandini660 commented 4 months ago

Even if a fix were made to propagate directly connected routes (where the outgoing interface has no IP address assigned) from the kernel to the BGP routing table, it would not enable sending traffic in the reverse direction. In a Data Center environment where traffic is predominantly TCP-based, bidirectional traffic must be expected. Therefore, an interface such as wg0 without an IP address configured would be considered incomplete from the Data Center perspective.

vgrebenschikov commented 4 months ago

Even if a fix were made to propagate directly connected routes (where the outgoing interface has no IP address assigned) from the kernel to the BGP routing table, it would not enable sending traffic in the reverse direction. In a Data Center environment where traffic is predominantly TCP-based, bidirectional traffic must be expected. Therefore, an interface such as wg0 without an IP address configured would be considered incomplete from the Data Center perspective.

It has IP address assigned, but, it is not "broadcast" interface, so, it, like any onther P2P interface, has address on our end and routes into interface, that it.

Similar problem with tun interface for example.

Kernel is very certan on this - a. there are direct interface routes: "packets with dst in prefix sent to interface directly" b. there are routes with next hop: "packets with dst in prefix sent to next-hop connected via interface"

somehow we have lost scenario a. above for BGP ...

vgrebenschikov commented 4 months ago

can you please provide the configuration topology for recreation of this bug.

topology is trivial, just two hosts connected with wireguard, and expected that BGP session will work over the tunnel.

image
tera2603 commented 4 months ago

can you please provide the configuration topology for recreation of this bug.

topology is trivial, just two hosts connected with wireguard, and expected that BGP session will work over the tunnel.

image

can you please provide configuration for better understanding.

vgrebenschikov commented 4 months ago

can you please provide the configuration topology for recreation of this bug.

can you please provide configuration for better understanding.

# ifconfig wg0
wg0: flags=10080c1<UP,RUNNING,NOARP,MULTICAST,LOWER_UP> metric 0 mtu 1420
    options=80000<LINKSTATE>
    inet 172.22.4.192 netmask 0xffffffff

# netstat -rn | fgrep wg0
172.22.4.253       link#6             UHS         wg0

# ping -c1 172.22.4.253
PING 172.22.4.253 (172.22.4.253): 56 data bytes
64 bytes from 172.22.4.253: icmp_seq=0 ttl=64 time=74.567 ms

# vtysh -e 'show run'
...
router bgp 65021
 no bgp ebgp-requires-policy
 no bgp network import-check
 neighbor 172.22.4.253 remote-as 65022
 neighbor 172.22.4.253 interface wg0
 neighbor 172.22.4.253 update-source wg0

# vtysh 
srv# show ip bgp 172.22.9.0/24
BGP routing table entry for 172.22.9.0/24, version 0
Paths: (1 available, no best path)
  Not advertised to any peer
  65022
    172.22.4.253 (inaccessible) from 172.22.4.253 (172.22.9.1)
      Origin IGP, invalid, external
      Last update: Thu Jun 27 19:17:33 2024

Notice:

  1. wg interface has /32 prefix assigned
  2. that FRR does not even notice valid and working connected route 172.22.4.253/32 and fallback to bigger network (see below)
  3. that wg0 interface has no BROADCAST in interface flags and there is NOARP - so FRR's assumption that there should be next-hop for wg0 routes is invalid.
  4. BGP session is ok, but routes are inaccessible
# route -n get 172.22.4.253
   route to: 172.22.4.253
destination: 172.22.4.253
        fib: 0
  interface: wg0
      flags: <UP,HOST,DONE,STATIC>
 recvpipe  sendpipe  ssthresh  rtt,msec    mtu        weight    expire
       0         0         0         0      1420         1         0

# vtysh -e 'show ip route 172.22.4.253'
Routing entry for 172.22.0.0/16
  Known via "ospf", distance 110, metric 120, best
  Last update 00:06:56 ago
  * 172.22.2.1, via em0, weight 1

What will fix situation - assignment of the subnet which will include other end of tunnel on wg0 interaface, now FRR think that other end is reachable, but it fact it was reachable before:

# ifconfig wg0 172.22.4.192/26

# vtysh -e 'show ip route 172.22.4.253'
Routing entry for 172.22.4.192/26
  Known via "connected", distance 0, metric 1, best
  Last update 00:01:01 ago
  * directly connected, wg0

# vtysh -e 'show ip bgp 172.22.9.0/24'
BGP routing table entry for 172.22.9.0/24, version 6
Paths: (1 available, best #1, table default)
  Advertised to non peer-group peers:
  172.22.4.253
  65022
    172.22.4.253 (metric 1) from 172.22.4.253 (172.22.9.1)
      Origin IGP, valid, external, best (First path received)
      Last update: Thu Jun 27 19:17:33 2024

probably, the problem is connected with th issue #9185

ton31337 commented 4 months ago

Is it possible to test this with the latest releases?

vgrebenschikov commented 4 months ago

Is it possible to test this with the latest releases?

Tested on 10.0 - situation the same

srv# show ip bgp 172.22.9.0/24
BGP routing table entry for 172.22.9.0/24, version 5
Paths: (1 available, no best path)
  Advertised to non peer-group peers:
  172.22.4.251 172.22.4.253
  65022
    172.22.4.253 (inaccessible) from 172.22.4.253 (172.22.9.1)
      Origin IGP, invalid, external
      Last update: Tue Jul  9 21:03:33 2024
srv#
ton31337 commented 2 months ago

Could you show show ip route 172.22.9.0/24 json?

ton31337 commented 1 month ago

@vgrebenschikov would be possible to test this patch? https://github.com/FRRouting/frr/pull/16948