FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.35k stars 1.25k forks source link

zebra sees interface "Interface is up, line protocol is down" even in system is up + routes install failed. #12983

Closed EasyNetDev closed 1 year ago

EasyNetDev commented 1 year ago

Describe the bug BGP LDP MPLS keeps static route over MPLS.

To Reproduce

  1. Set VRF.
  2. Set Teaming interface over a Tengiga interface (I'm using Quadport QLogic Corp. FastLinQ QL41000 Series 10/25/40/50GbE Controller).
  3. Setup a 2 routers with same ISP uplinks and add BGP with ISP inside VRF.
  4. Setup MPLS beween 2 routers and redistribute BGP ISP routes between routers.
  5. Set a static route to one of the routers towards both uplinks.
  6. After a while, even the route is still marked as static route will have a MPLS nexthop, because local static route is not reachable by Zebra anymore.
  7. Check the VLAN interface and "line protocol is down" even in system the interface is ok.

Expected behavior

The static route should stay static.

Screenshots

MPLS config:

router bgp 43474
 bgp router-id 10.100.2.1
 no bgp suppress-duplicates
 bgp graceful-shutdown
 bgp graceful-restart
 neighbor 10.100.1.1 remote-as 43474
 neighbor 10.100.1.1 description R01_VPNv4
 neighbor 10.100.1.1 bfd
 neighbor 10.100.1.1 bfd check-control-plane-failure
 neighbor 10.100.1.1 update-source 10.100.2.1
 neighbor 10.100.1.1 capability extended-nexthop
 !
 address-family ipv4 unicast
  no neighbor 10.100.1.1 activate
 exit-address-family
 !
 address-family ipv4 vpn
  neighbor 10.100.1.1 activate
  neighbor 10.100.1.1 soft-reconfiguration inbound
 exit-address-family
 !
 address-family ipv6 vpn
 exit-address-family
exit

Internet VRF:

router bgp 43474 vrf internet
 bgp router-id 10.100.2.5
 neighbor 89.238.245.113 remote-as 6663
 neighbor 89.238.245.113 description EUROWEB_IPv4
 neighbor 89.238.245.113 update-source po1.650
 neighbor 89.238.245.113 graceful-restart
 neighbor 193.230.200.47 remote-as 6663
 neighbor 193.230.200.47 description EUROWEB_IPv4
 neighbor 193.230.200.47 ebgp-multihop 3
 neighbor 193.230.200.47 update-source po1.650
 neighbor 193.230.200.47 graceful-restart
 address-family ipv4 unicast
  network 89.X.X.0/24 route-map BGP-own-prefixes
  redistribute connected route-map VPN-export-INTERNET-connected
  redistribute static route-map VPN-export-INTERNET-static
  neighbor 89.238.245.113 soft-reconfiguration inbound
  neighbor 89.238.245.113 route-map received-isp-01 in
  neighbor 89.238.245.113 route-map advertised-isp-01 out
  neighbor 193.230.200.47 soft-reconfiguration inbound
  neighbor 193.230.200.47 route-map received-isp-02 in
  neighbor 193.230.200.47 route-map advertised-isp-01 out
  label vpn export auto
! X = 1 for R01, X = 2 for R02
  rd vpn export 43474:1000X
  rt vpn import 43474:10000 43474:10990
  rt vpn export 43474:10000
  export vpn
  import vpn
 exit-address-family

VRF Internet:

vrf internet
 ip route 193.230.200.47/32 89.238.245.113

As you can see 193.230.200.47 is configured over 89.238.245.113 as a static route.

interface po1.650
 description EUROWEB;ISP;
! A = 114 for R01, A = 115 for R02.
 ip address 89.238.245.A/29
exit

Output:

# do sh ip route vrf internet 193.230.200.47
Routing entry for 193.230.200.47/32
  Known via "static", distance 1, metric 0, vrf internet, best
  Last update 00:47:47 ago
    89.238.245.113 (recursive), weight 1
  *   10.100.0.9, via gi2-0, label IPv4 Explicit Null/84, weight 1

If I delete the static route from VRF internet:

R02(config-vrf)# no  ip route 193.230.200.47/32 89.238.245.113
R02(config-vrf)# do sh ip route vrf internet 193.230.200.47
Routing entry for 193.230.200.47/32
  Known via "bgp", distance 200, metric 0, vrf internet, best
  Last update 00:00:02 ago
    10.100.1.1(vrf default) (recursive), label 84, weight 1
  *   10.100.0.9, via gi2-0(vrf default), label IPv4 Explicit Null/84, weight 1

R02(config-vrf)#

Readd static route:

R02(config-vrf)#  ip route 193.230.200.47/32 89.238.245.113
R02(config-vrf)# do sh ip route vrf internet 193.230.200.47
Routing entry for 193.230.200.47/32
  Known via "static", distance 1, metric 0, vrf internet, best
  Last update 00:00:02 ago
    89.238.245.113 (recursive), weight 1
  *   10.100.0.9, via gi2-0, label IPv4 Explicit Null/84, weight 1

Checking local connection for po1.650:

# ip a l po1.650
53: po1.650@po1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master internet state UP group default qlen 1000
    link/ether 06:1b:64:0c:cc:cf brd ff:ff:ff:ff:ff:ff
    inet 89.238.245.115/29 brd 89.238.245.119 scope global po1.650
       valid_lft forever preferred_lft forever

# ping 89.238.245.113
PING 89.238.245.113 (89.238.245.113) 56(84) bytes of data.
64 bytes from 89.238.245.113: icmp_seq=1 ttl=62 time=2.10 ms
64 bytes from 89.238.245.113: icmp_seq=2 ttl=62 time=2.68 ms

# arp -n -i po1.650
Address                  HWtype  HWaddress           Flags Mask            Iface
89.238.245.113           ether   54:e0:32:75:86:72   C                     po1.650

# vtysh
R02# show interface po1.650
Interface po1.650 is up, line protocol is down
  Link ups:       0    last: (never)
  Link downs:     1    last: 2023/03/13 09:20:19.30
  vrf: internet
  Description: EUROWEB;ISP;
  OS Description: Po1.650
  index 53 metric 0 mtu 1500 speed 10000
  flags: <UP,BROADCAST,MULTICAST>
  Ignore all v4 routes with linkdown
  Ignore all v6 routes with linkdown
  Type: Ethernet
  HWaddr: 06:1b:64:0c:cc:cf
  inet 89.238.245.115/29
  inet6 2a02:2720:1000:311::3/64
  inet6 fe80::41b:64ff:fe0c:cccf/64
  Interface Type Vlan
  Interface Slave Type Vrf
  VLAN Id 650
  protodown: off
  Parent interface: po1

After a shut/no shut over po1.650 interface then line protocol is going up.

And also I'm noticing a lot of "Route install failed" like this message:

2023-03-13T11:15:38.408755+02:00 R02 zebra[1082571]: [HSYZM-HV7HF] Extended Error: Nexthop id does not exist
2023-03-13T11:15:38.408909+02:00 R02 zebra[1082571]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=14628437, pid=2544673964
2023-03-13T11:15:38.409002+02:00 R02 zebra[1082571]: [VYKYC-709DP] internet(18:1010):130.137.124.0/24: Route install failed

This error I'm not sure if is related to the same issue.

Versions

Additional context

donaldsharp commented 1 year ago

In the broken state can we see the output of show ip route 89.238.245.113 ?

EasyNetDev commented 1 year ago

Sure. Give some time to re-enable NHRP daemon. Without NHRP daemon looks everything fine.

EasyNetDev commented 1 year ago

I couldn't replicate the bug on the same system, but I think a similar issue I found it in FRR 8.4.2 on a RPi device.

router bgp 43474
 bgp router-id 10.100.12.1
 !
 address-family ipv4 unicast
  label vpn export auto
  rd vpn export 43474:11012
  rt vpn import 43474:11910
  rt vpn export 43474:11012
  export vpn
  import vpn
 exit-address-family
 !
 address-family ipv6 unicast
  rd vpn export 43474:11012
  rt vpn export 43474:11012
 exit-address-family
exit

router bgp 65512 vrf VPN1
 neighbor 10.101.1.37 remote-as 43474
 neighbor 10.101.1.37 description R01_VPN
 neighbor 10.102.1.37 remote-as 43474
 neighbor 10.102.1.37 description R02_VPN
 !
 address-family ipv4 unicast
  redistribute connected route-map VPN-export-VPN1-connected
  neighbor 10.101.1.37 next-hop-self
  neighbor 10.101.1.37 soft-reconfiguration inbound
  neighbor 10.101.1.37 route-map VPN-VPN1-received in
  neighbor 10.101.1.37 route-map VPN-export-VPN1-advertised out
  neighbor 10.102.1.37 soft-reconfiguration inbound
  neighbor 10.102.1.37 route-map VPN-VPN1-received in
  neighbor 10.102.1.37 route-map VPN-export-VPN1-advertised out
  label vpn export auto
  rd vpn export 43474:13012
  rt vpn import 43474:13000 43474:13990
  rt vpn export 43474:13000
  export vpn
  import vpn
 exit-address-family
exit

The the output:

# do sh bgp vrf VPN1 summary

IPv4 Unicast Summary (VRF VPN1):
BGP router identifier 192.168.144.12, local AS number 65512 vrf-id 5
BGP table version 49
RIB entries 13, using 2496 bytes of memory
Peers 2, using 1448 KiB of memory

Neighbor        V         AS   MsgRcvd   MsgSent   TblVer  InQ OutQ  Up/Down State/PfxRcd   PfxSnt Desc
10.101.1.37     4      43474       874       887        0    0    0 00:08:59            5        4 R01_VPN
10.102.1.37     4      43474       242       256        0    0    0 00:00:10 Idle (Admin)        0 R02_VPN

Total number of neighbors 2

R12(config-router)# do sh bgp vrf VPN1 ipv4
BGP table version is 49, local router ID is 192.168.144.12, vrf id 5
Default local pref 100, local AS 65512
Status codes:  s suppressed, d damped, h history, * valid, > best, = multipath,
               i internal, r RIB-failure, S Stale, R Removed
Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self
Origin codes:  i - IGP, e - EGP, ? - incomplete
RPKI validation codes: V valid, I invalid, N Not found

   Network          Next Hop            Metric LocPrf Weight Path
   0.0.0.0/0        10.101.1.37              0             0 43474 i
   10.100.1.10/32   10.101.1.37              0             0 43474 ?
   10.101.1.32/30   10.101.1.37              0             0 43474 ?
   10.101.1.36/30   10.101.1.37              0             0 43474 ?
*>                  0.0.0.0                  0         32768 ?
   10.102.1.32/30   10.101.1.37                            0 43474 ?
*> 10.102.1.36/30   0.0.0.0                  0         32768 ?
*> 192.168.143.0/24 0.0.0.0                  0         32768 ?
*> 192.168.144.12/32
                    0.0.0.0                  0         32768 ?

Displayed  8 routes and 9 total paths

R12(config-router)# do sh bgp vrf VPN1 ipv4 0.0.0.0/0
BGP routing table entry for 0.0.0.0/0, version 45
Paths: (1 available, no best path)
  Not advertised to any peer
  43474
    10.101.1.37 (inaccessible) from 10.101.1.37 (10.100.1.10)
      Origin IGP, metric 0, invalid, external
      Last update: Thu Mar 30 14:31:11 2023

R12(config-router)# do sh ip route vrf VPN1
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF VPN1:
S>* 0.0.0.0/0 [250/0] unreachable (blackhole), weight 1, 00:15:03
C>* 10.101.1.36/30 is directly connected, wg120102, 13:52:13
C>* 10.102.1.36/30 is directly connected, wg120202, 03:44:56
C>* 169.254.0.0/16 is directly connected, VPN1, 13:52:06
C>* 192.168.143.0/24 is directly connected, wlan0, 13:26:38
C>* 192.168.144.12/32 is directly connected, VPN1, 13:50:51

R12(config-router)# do sh ip route vrf VPN1 10.101.1.37
Routing entry for 10.101.1.36/30
  Known via "connected", distance 0, metric 0, vrf VPN1, best
  Last update 13:52:54 ago
  * directly connected, wg120102

I don't know if the bugs are correlated, but I will try to compile latest version of FRR on RPi.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

frrbot[bot] commented 1 year ago

This issue will be automatically closed in the specified period unless there is further activity.