FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.31k stars 1.25k forks source link

Kernel routes are not updated properly in zebra RIB / nhg nexthop check is wrong #13561

Open zdc opened 1 year ago

zdc commented 1 year ago

Describe the bug

In certain conditions, kernel routes are not properly updated in zebra, for example when an IP address is flushed from an interface.

To Reproduce

Reproducing is really simple:

ip link set dev eth0 up
ip -4 addr add 192.168.0.2/24 dev eth0
ip -4 route add default via 192.168.0.1
ip -4 addr flush dev eth0

After this, the kernel will have no routes, but FRR still has an active default route:

K>* 0.0.0.0/0 [0/0] via 192.168.0.1, eth0, 00:00:07

Logs after ip -4 addr flush dev eth0:

May 19 14:41:28 vyos zebra[3764]: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_DELADDR(21), len=80, seq=1684507290, pid=3800
May 19 14:41:28 vyos zebra[3764]: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_DELROUTE(25), len=60, seq=0, pid=0
May 19 14:41:28 vyos zebra[3764]: [SKNFJ-G938V] RTM_DELROUTE ipv4 unicast proto kernel NS 0
May 19 14:41:28 vyos zebra[3764]: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_DELROUTE(25), len=60, seq=0, pid=0
May 19 14:41:28 vyos zebra[3764]: [SKNFJ-G938V] RTM_DELROUTE ipv4 broadcast proto kernel NS 0
May 19 14:41:28 vyos zebra[3764]: [J3J81-V75NW] Route rtm_type: broadcast(3) intentionally ignoring
May 19 14:41:28 vyos zebra[3764]: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_DELROUTE(25), len=60, seq=0, pid=0
May 19 14:41:28 vyos zebra[3764]: [SKNFJ-G938V] RTM_DELROUTE ipv4 local proto kernel NS 0
May 19 14:41:28 vyos zebra[3764]: [J3J81-V75NW] Route rtm_type: local(2) intentionally ignoring
May 19 14:41:28 vyos zebra[3764]: [K8FXY-V65ZJ] Intf dplane ctx 0x7f4988026dd0, op INTF_ADDR_DEL, ifindex (2), result QUEUED
May 19 14:41:28 vyos zebra[3764]: [MZPZA-W042K] zebra_if_addr_update_ctx: INTF_ADDR_DEL: ifindex eth0(2), addr 192.168.0.2/24
May 19 14:41:28 vyos zebra[3764]: [XN0NB-2NSYE] MESSAGE: ZEBRA_INTERFACE_ADDRESS_DELETE 192.168.0.2/24 on eth0 vrf default(0)
May 19 14:41:28 vyos zebra[3764]: [X55C6-A6ZXJ] Route 192.168.0.0/24(0) queued for processing into sub-queue Early Route Processing
May 19 14:41:28 vyos zebra[3764]: [X55C6-A6ZXJ] Route 192.168.0.0/24(0) queued for processing into sub-queue Early Route Processing
May 19 14:41:28 vyos zebra[3764]: [QQWTF-ZKJDD] rib_update: Scheduled VRF (ALL), event RIB_UPDATE_KERNEL
May 19 14:41:28 vyos zebra[3764]: [M8JMF-J6PZV] rib_update_handle_vrf_all: Handling VRF (ALL) event RIB_UPDATE_KERNEL
May 19 14:41:28 vyos zebra[3764]: [QR5K4-A1079] rib_update_table: IPv4 VRF default Table 254 event RIB_UPDATE_KERNEL Route type: wildcard
May 19 14:41:28 vyos zebra[3764]: [JF0K0-DVHWH] rib_meta_queue_add: (0:254):0.0.0.0/0: queued rn 0x56103e25dcd0 into sub-queue Kernel Routes
May 19 14:41:28 vyos zebra[3764]: [QR5K4-A1079] rib_update_table: IPv4 VRF default Table 254 event RIB_UPDATE_KERNEL Route type: wildcard
May 19 14:41:28 vyos zebra[3764]: [QR5K4-A1079] rib_update_table: IPv6 VRF default Table 254 event RIB_UPDATE_KERNEL Route type: wildcard
May 19 14:41:28 vyos zebra[3764]: [QR5K4-A1079] rib_update_table: IPv6 VRF default Table 254 event RIB_UPDATE_KERNEL Route type: wildcard
May 19 14:41:28 vyos zebra[3764]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_DELADDR(21), len=80, seq=1684507290, pid=3800
May 19 14:41:28 vyos zebra[3764]: [RGWF1-EHXT1] netlink_interface_addr_dplane: RTM_DELADDR nsid 0 ifindex 2 flags 0x80:
May 19 14:41:28 vyos zebra[3764]: [XMC8C-4ZFJ9]   IFA_LOCAL     192.168.0.2/24
May 19 14:41:28 vyos zebra[3764]: [ME3M2-X6YT9]   IFA_ADDRESS   192.168.0.2/24
May 19 14:41:28 vyos zebra[3764]: [Y9HR3-XD5TG]   IFA_LABEL     eth0
May 19 14:41:28 vyos zebra[3764]: [P2VPT-508WP]   IFA_CACHEINFO pref -1, valid -1
May 19 14:41:28 vyos zebra[3764]: [QE6V0-J8BG5] rib_delnode: (0:254):192.168.0.0/24: rn 0x56103e40adc0, re 0x56103e40afd0, removing
May 19 14:41:28 vyos zebra[3764]: [JF0K0-DVHWH] rib_meta_queue_add: (0:254):192.168.0.0/24: queued rn 0x56103e40adc0 into sub-queue Connected Routes
May 19 14:41:28 vyos zebra[3764]: [VKWCR-QB19H] zebra_nhg_free: nhe 0x56103e40ab00 (0[]), refcnt 0, NH directly connected, eth0
May 19 14:41:28 vyos zebra[3764]: [QE6V0-J8BG5] rib_delnode: (0:254):192.168.0.0/24 (MRIB): rn 0x56103e40b8a0, re 0x56103e40b940, removing
May 19 14:41:28 vyos zebra[3764]: [JF0K0-DVHWH] rib_meta_queue_add: (0:254):192.168.0.0/24 (MRIB): queued rn 0x56103e40b8a0 into sub-queue Connected Routes
May 19 14:41:28 vyos zebra[3764]: [VKWCR-QB19H] zebra_nhg_free: nhe 0x56103e40b6d0 (0[]), refcnt 0, NH directly connected, eth0
May 19 14:41:28 vyos zebra[3764]: [NZNZ4-7P54Y] default(0:254):192.168.0.0/24: Processing rn 0x56103e40adc0
May 19 14:41:28 vyos zebra[3764]: [ZJVZ4-XEGPF] default(0:254):192.168.0.0/24: Examine re 0x56103e40afd0 (connected) status: Removed Installed flags: Selected dist 0 metric 0
May 19 14:41:28 vyos zebra[3764]: [JPJF4-TGCY5] default(0:254):192.168.0.0/24: After processing: old_selected 0x56103e40afd0 new_selected 0x0 old_fib 0x56103e40afd0 new_fib 0x0
May 19 14:41:28 vyos zebra[3764]: [S31W0-H281H] 0:254192.168.0.0/24: Redist del: re 0x56103e40afd0 (0:connected), new re 0x0 (0:None)
May 19 14:41:28 vyos zebra[3764]: [S59C1-4C5PP] default(0:254):192.168.0.0/24: Deleting route rn 0x56103e40adc0, re 0x56103e40afd0 (connected)
May 19 14:41:28 vyos zebra[3764]: [NM15X-X83N9] rib_process: (0:254):192.168.0.0/24: rn 0x56103e40adc0, removing re 0x56103e40afd0
May 19 14:41:28 vyos zebra[3764]: [Y53JX-CBC5H] rib_unlink: (0:254):192.168.0.0/24: rn 0x56103e40adc0, re 0x56103e40afd0
May 19 14:41:28 vyos zebra[3764]: [WDEB1-93HCZ] zebra_nhg_decrement_ref: nhe 0x56103e409a10 (66[if 2]) 2 => 1
May 19 14:41:28 vyos zebra[3764]: [KT8QQ-45WQ0] rib_gc_dest: (0:?):192.168.0.0/24: removing dest from table
May 19 14:41:28 vyos zebra[3764]: [PS6CZ-GQDN1] zebra_rib_evaluate_rn_nexthops: 192.168.0.0/24 Being examined for Nexthop Tracking Count: 0
May 19 14:41:28 vyos zebra[3764]: [N744R-XB3J9] 192.168.0.0/24 has no tracking NHTs. Bailing
May 19 14:41:28 vyos zebra[3764]: [HH6N2-PDCJS] default(0:0):192.168.0.0/24 rn 0x56103e40adc0 dequeued from sub-queue Connected Routes
May 19 14:41:28 vyos zebra[3764]: [NZNZ4-7P54Y] default(0:254):192.168.0.0/24: Processing rn 0x56103e40b8a0
May 19 14:41:28 vyos zebra[3764]: [ZJVZ4-XEGPF] default(0:254):192.168.0.0/24: Examine re 0x56103e40b940 (connected) status: Removed flags: Selected dist 0 metric 0
May 19 14:41:28 vyos zebra[3764]: [JPJF4-TGCY5] default(0:254):192.168.0.0/24: After processing: old_selected 0x56103e40b940 new_selected 0x0 old_fib 0x0 new_fib 0x0
May 19 14:41:28 vyos zebra[3764]: [S31W0-H281H] 0:254192.168.0.0/24: Redist del: re 0x56103e40b940 (0:connected), new re 0x0 (0:None)
May 19 14:41:28 vyos zebra[3764]: [NM15X-X83N9] rib_process: (0:254):192.168.0.0/24 (MRIB): rn 0x56103e40b8a0, removing re 0x56103e40b940
May 19 14:41:28 vyos zebra[3764]: [Y53JX-CBC5H] rib_unlink: (0:254):192.168.0.0/24 (MRIB): rn 0x56103e40b8a0, re 0x56103e40b940
May 19 14:41:28 vyos zebra[3764]: [WDEB1-93HCZ] zebra_nhg_decrement_ref: nhe 0x56103e409a10 (66[if 2]) 1 => 0
May 19 14:41:28 vyos zebra[3764]: [KT8QQ-45WQ0] rib_gc_dest: (0:?):192.168.0.0/24 (MRIB): removing dest from table
May 19 14:41:28 vyos zebra[3764]: [PS6CZ-GQDN1] zebra_rib_evaluate_rn_nexthops: 192.168.0.0/24 Being examined for Nexthop Tracking Count: 0
May 19 14:41:28 vyos zebra[3764]: [N744R-XB3J9] 192.168.0.0/24 has no tracking NHTs. Bailing
May 19 14:41:28 vyos zebra[3764]: [HH6N2-PDCJS] default(0:0):192.168.0.0/24 rn 0x56103e40b8a0 dequeued from sub-queue Connected Routes
May 19 14:41:28 vyos zebra[3764]: [NZNZ4-7P54Y] default(0:254):0.0.0.0/0: Processing rn 0x56103e25dcd0
May 19 14:41:28 vyos zebra[3764]: [ZJVZ4-XEGPF] default(0:254):0.0.0.0/0: Examine re 0x56103e40b110 (kernel) status: Changed Installed flags: Selected dist 0 metric 0
May 19 14:41:28 vyos zebra[3764]: [VTD0C-NR53W] nexthop_active_update: re 0x56103e40b110 nhe 0x56103e40ac40 (67[192.168.0.1 if 2]), curr_nhe 0x56103e40afd0
May 19 14:41:28 vyos zebra[3764]: [T9JWA-N8HM5] nexthop_active_check: re 0x56103e40b110, nexthop 192.168.0.1, via eth0
May 19 14:41:28 vyos zebra[3764]: [ZG85Y-SBJH3] nexthop_active_update: re 0x56103e40b110 curr_active 1
May 19 14:41:28 vyos zebra[3764]: [MA7PW-DZFV7] zebra_nhg_rib_find_nhe: rt_nhe 0x56103e40afd0 (0[])
May 19 14:41:28 vyos zebra[3764]: [YDEFD-9VKZX] zebra_nhe_find: id 0, lookup 0x56103e40afd0, vrf 0, type 28, depends 0x0
May 19 14:41:28 vyos zebra[3764]: [XCWSF-H8QAB] zebra_nhe_find: lookup => 0x56103e40ac40 (67[192.168.0.1 if 2])
May 19 14:41:28 vyos zebra[3764]: [W4Z4R-NTSMD] zebra_nhg_rib_find_nhe: => nhe 0x56103e40ac40 (67[192.168.0.1 if 2])
May 19 14:41:28 vyos zebra[3764]: [J6P4K-BSDDN] nexthop_active_update: re 0x56103e40b110 CHANGED: nhe 0x56103e40ac40 (67[192.168.0.1 if 2]) => new_nhe 0x56103e40ac40 (67[192.168.0.1 if 2])
May 19 14:41:28 vyos zebra[3764]: [VKWCR-QB19H] zebra_nhg_free: nhe 0x56103e40afd0 (0[]), refcnt 0, NH 192.168.0.1, via eth0
May 19 14:41:28 vyos zebra[3764]: [JPJF4-TGCY5] default(0:254):0.0.0.0/0: After processing: old_selected 0x56103e40b110 new_selected 0x56103e40b110 old_fib 0x56103e40b110 new_fib 0x56103e40b110
May 19 14:41:28 vyos zebra[3764]: [X2YHK-SWH93] default(0:254):0.0.0.0/0: Updating route rn 0x56103e25dcd0, re 0x56103e40b110 (kernel)
May 19 14:41:28 vyos zebra[3764]: [PJXNX-HZST8] zebra_nhg_install_kernel: valid flag set for nh 67[192.168.0.1 if 2]
May 19 14:41:28 vyos zebra[3764]: [HH6N2-PDCJS] default(0:254):0.0.0.0/0 rn 0x56103e25dcd0 dequeued from sub-queue Kernel Routes
May 19 14:41:28 vyos zebra[3764]: [TQKA8-0276P] Not Notifying Owner: connected about prefix 192.168.0.0/24(254) 3 vrf: 0
May 19 14:41:28 vyos zebra[3764]: [PS6CZ-GQDN1] zebra_rib_evaluate_rn_nexthops: 192.168.0.0/24 Being examined for Nexthop Tracking Count: 0
May 19 14:41:28 vyos zebra[3764]: [N744R-XB3J9] 192.168.0.0/24 has no tracking NHTs. Bailing
May 19 14:41:28 vyos zebra[3764]: [GHWHS-ZKQM5] update_from_ctx: default(0:254):0.0.0.0/0: SELECTED, re 0x56103e40b110
May 19 14:41:28 vyos zebra[3764]: [TS3SH-1276M] default(0:254):0.0.0.0/0 update_from_ctx(): no fib nhg
May 19 14:41:28 vyos zebra[3764]: [HKQXC-4STSK] default(0:254):0.0.0.0/0 update_from_ctx(): rib nhg matched, changed 'false'
May 19 14:41:28 vyos zebra[3764]: [Z1MP1-RFGJA] (0:254):0.0.0.0/0(0): Redist update re 0x56103e40b110 (kernel), old 0x56103e40b110 (kernel)
May 19 14:41:28 vyos zebra[3764]: [TQKA8-0276P] Not Notifying Owner: kernel about prefix 0.0.0.0/0(254) 2 vrf: 0
May 19 14:41:28 vyos zebra[3764]: [PS6CZ-GQDN1] zebra_rib_evaluate_rn_nexthops: 0.0.0.0/0 Being examined for Nexthop Tracking Count: 0

Expected behavior

FRR should remove a route, if it does not exist in a kernel anymore.

I traced the problem till the nexthop_active_check() in https://github.com/FRRouting/frr/blob/32b20e1ad65e8db2ef80dd39b255f34de2802cd2/zebra/zebra_nhg.c#L2565.

I think the problem is that kernel routes are trusted too much. This check does not seem to be right: https://github.com/FRRouting/frr/blob/32b20e1ad65e8db2ef80dd39b255f34de2802cd2/zebra/zebra_nhg.c#L2589-L2602

For example, in this case, an interface status is up, and goto skip_check is activated, but there are no IP addresses on this interface.

Without this check, everything looks better:

May 19 14:48:34 vyos zebra[4116]: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_DELADDR(21), len=80, seq=1684507716, pid=4152
May 19 14:48:34 vyos zebra[4116]: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_DELROUTE(25), len=60, seq=0, pid=0
May 19 14:48:34 vyos zebra[4116]: [SKNFJ-G938V] RTM_DELROUTE ipv4 unicast proto kernel NS 0
May 19 14:48:34 vyos zebra[4116]: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_DELROUTE(25), len=60, seq=0, pid=0
May 19 14:48:34 vyos zebra[4116]: [SKNFJ-G938V] RTM_DELROUTE ipv4 broadcast proto kernel NS 0
May 19 14:48:34 vyos zebra[4116]: [J3J81-V75NW] Route rtm_type: broadcast(3) intentionally ignoring
May 19 14:48:34 vyos zebra[4116]: [KMXEB-K771Y] netlink_parse_info: netlink-listen (NS 0) type RTM_DELROUTE(25), len=60, seq=0, pid=0
May 19 14:48:34 vyos zebra[4116]: [SKNFJ-G938V] RTM_DELROUTE ipv4 local proto kernel NS 0
May 19 14:48:34 vyos zebra[4116]: [J3J81-V75NW] Route rtm_type: local(2) intentionally ignoring
May 19 14:48:34 vyos zebra[4116]: [K8FXY-V65ZJ] Intf dplane ctx 0x7f2144026dd0, op INTF_ADDR_DEL, ifindex (2), result QUEUED
May 19 14:48:34 vyos zebra[4116]: [MZPZA-W042K] zebra_if_addr_update_ctx: INTF_ADDR_DEL: ifindex eth0(2), addr 192.168.0.2/24
May 19 14:48:34 vyos zebra[4116]: [XN0NB-2NSYE] MESSAGE: ZEBRA_INTERFACE_ADDRESS_DELETE 192.168.0.2/24 on eth0 vrf default(0)
May 19 14:48:34 vyos zebra[4116]: [X55C6-A6ZXJ] Route 192.168.0.0/24(0) queued for processing into sub-queue Early Route Processing
May 19 14:48:34 vyos zebra[4116]: [X55C6-A6ZXJ] Route 192.168.0.0/24(0) queued for processing into sub-queue Early Route Processing
May 19 14:48:34 vyos zebra[4116]: [QQWTF-ZKJDD] rib_update: Scheduled VRF (ALL), event RIB_UPDATE_KERNEL
May 19 14:48:34 vyos zebra[4116]: [M8JMF-J6PZV] rib_update_handle_vrf_all: Handling VRF (ALL) event RIB_UPDATE_KERNEL
May 19 14:48:34 vyos zebra[4116]: [QR5K4-A1079] rib_update_table: IPv4 VRF default Table 254 event RIB_UPDATE_KERNEL Route type: wildcard
May 19 14:48:34 vyos zebra[4116]: [JF0K0-DVHWH] rib_meta_queue_add: (0:254):0.0.0.0/0: queued rn 0x5566f5aafcd0 into sub-queue Kernel Routes
May 19 14:48:34 vyos zebra[4116]: [QR5K4-A1079] rib_update_table: IPv4 VRF default Table 254 event RIB_UPDATE_KERNEL Route type: wildcard
May 19 14:48:34 vyos zebra[4116]: [QR5K4-A1079] rib_update_table: IPv6 VRF default Table 254 event RIB_UPDATE_KERNEL Route type: wildcard
May 19 14:48:34 vyos zebra[4116]: [QR5K4-A1079] rib_update_table: IPv6 VRF default Table 254 event RIB_UPDATE_KERNEL Route type: wildcard
May 19 14:48:34 vyos zebra[4116]: [KMXEB-K771Y] netlink_parse_info: netlink-dp-in (NS 0) type RTM_DELADDR(21), len=80, seq=1684507716, pid=4152
May 19 14:48:34 vyos zebra[4116]: [RGWF1-EHXT1] netlink_interface_addr_dplane: RTM_DELADDR nsid 0 ifindex 2 flags 0x80:
May 19 14:48:34 vyos zebra[4116]: [XMC8C-4ZFJ9]   IFA_LOCAL     192.168.0.2/24
May 19 14:48:34 vyos zebra[4116]: [ME3M2-X6YT9]   IFA_ADDRESS   192.168.0.2/24
May 19 14:48:34 vyos zebra[4116]: [Y9HR3-XD5TG]   IFA_LABEL     eth0
May 19 14:48:34 vyos zebra[4116]: [P2VPT-508WP]   IFA_CACHEINFO pref -1, valid -1
May 19 14:48:34 vyos zebra[4116]: [QE6V0-J8BG5] rib_delnode: (0:254):192.168.0.0/24: rn 0x5566f5c5cd40, re 0x5566f5c5cfb0, removing
May 19 14:48:34 vyos zebra[4116]: [JF0K0-DVHWH] rib_meta_queue_add: (0:254):192.168.0.0/24: queued rn 0x5566f5c5cd40 into sub-queue Connected Routes
May 19 14:48:34 vyos zebra[4116]: [VKWCR-QB19H] zebra_nhg_free: nhe 0x5566f5c5d1d0 (0[]), refcnt 0, NH directly connected, eth0
May 19 14:48:34 vyos zebra[4116]: [QE6V0-J8BG5] rib_delnode: (0:254):192.168.0.0/24 (MRIB): rn 0x5566f5c5d880, re 0x5566f5c5d920, removing
May 19 14:48:34 vyos zebra[4116]: [JF0K0-DVHWH] rib_meta_queue_add: (0:254):192.168.0.0/24 (MRIB): queued rn 0x5566f5c5d880 into sub-queue Connected Routes
May 19 14:48:34 vyos zebra[4116]: [VKWCR-QB19H] zebra_nhg_free: nhe 0x5566f5c5cb80 (0[]), refcnt 0, NH directly connected, eth0
May 19 14:48:34 vyos zebra[4116]: [NZNZ4-7P54Y] default(0:254):192.168.0.0/24: Processing rn 0x5566f5c5cd40
May 19 14:48:34 vyos zebra[4116]: [ZJVZ4-XEGPF] default(0:254):192.168.0.0/24: Examine re 0x5566f5c5cfb0 (connected) status: Removed Installed flags: Selected dist 0 metric 0
May 19 14:48:34 vyos zebra[4116]: [JPJF4-TGCY5] default(0:254):192.168.0.0/24: After processing: old_selected 0x5566f5c5cfb0 new_selected 0x0 old_fib 0x5566f5c5cfb0 new_fib 0x0
May 19 14:48:34 vyos zebra[4116]: [S31W0-H281H] 0:254192.168.0.0/24: Redist del: re 0x5566f5c5cfb0 (0:connected), new re 0x0 (0:None)
May 19 14:48:34 vyos zebra[4116]: [S59C1-4C5PP] default(0:254):192.168.0.0/24: Deleting route rn 0x5566f5c5cd40, re 0x5566f5c5cfb0 (connected)
May 19 14:48:34 vyos zebra[4116]: [NM15X-X83N9] rib_process: (0:254):192.168.0.0/24: rn 0x5566f5c5cd40, removing re 0x5566f5c5cfb0
May 19 14:48:34 vyos zebra[4116]: [Y53JX-CBC5H] rib_unlink: (0:254):192.168.0.0/24: rn 0x5566f5c5cd40, re 0x5566f5c5cfb0
May 19 14:48:34 vyos zebra[4116]: [WDEB1-93HCZ] zebra_nhg_decrement_ref: nhe 0x5566f5c5cc20 (76[if 2]) 2 => 1
May 19 14:48:34 vyos zebra[4116]: [KT8QQ-45WQ0] rib_gc_dest: (0:?):192.168.0.0/24: removing dest from table
May 19 14:48:34 vyos zebra[4116]: [PS6CZ-GQDN1] zebra_rib_evaluate_rn_nexthops: 192.168.0.0/24 Being examined for Nexthop Tracking Count: 0
May 19 14:48:34 vyos zebra[4116]: [N744R-XB3J9] 192.168.0.0/24 has no tracking NHTs. Bailing
May 19 14:48:34 vyos zebra[4116]: [HH6N2-PDCJS] default(0:0):192.168.0.0/24 rn 0x5566f5c5cd40 dequeued from sub-queue Connected Routes
May 19 14:48:34 vyos zebra[4116]: [NZNZ4-7P54Y] default(0:254):192.168.0.0/24: Processing rn 0x5566f5c5d880
May 19 14:48:34 vyos zebra[4116]: [ZJVZ4-XEGPF] default(0:254):192.168.0.0/24: Examine re 0x5566f5c5d920 (connected) status: Removed flags: Selected dist 0 metric 0
May 19 14:48:34 vyos zebra[4116]: [JPJF4-TGCY5] default(0:254):192.168.0.0/24: After processing: old_selected 0x5566f5c5d920 new_selected 0x0 old_fib 0x0 new_fib 0x0
May 19 14:48:34 vyos zebra[4116]: [S31W0-H281H] 0:254192.168.0.0/24: Redist del: re 0x5566f5c5d920 (0:connected), new re 0x0 (0:None)
May 19 14:48:34 vyos zebra[4116]: [NM15X-X83N9] rib_process: (0:254):192.168.0.0/24 (MRIB): rn 0x5566f5c5d880, removing re 0x5566f5c5d920
May 19 14:48:34 vyos zebra[4116]: [Y53JX-CBC5H] rib_unlink: (0:254):192.168.0.0/24 (MRIB): rn 0x5566f5c5d880, re 0x5566f5c5d920
May 19 14:48:34 vyos zebra[4116]: [WDEB1-93HCZ] zebra_nhg_decrement_ref: nhe 0x5566f5c5cc20 (76[if 2]) 1 => 0
May 19 14:48:34 vyos zebra[4116]: [KT8QQ-45WQ0] rib_gc_dest: (0:?):192.168.0.0/24 (MRIB): removing dest from table
May 19 14:48:34 vyos zebra[4116]: [PS6CZ-GQDN1] zebra_rib_evaluate_rn_nexthops: 192.168.0.0/24 Being examined for Nexthop Tracking Count: 0
May 19 14:48:34 vyos zebra[4116]: [N744R-XB3J9] 192.168.0.0/24 has no tracking NHTs. Bailing
May 19 14:48:34 vyos zebra[4116]: [HH6N2-PDCJS] default(0:0):192.168.0.0/24 rn 0x5566f5c5d880 dequeued from sub-queue Connected Routes
May 19 14:48:34 vyos zebra[4116]: [NZNZ4-7P54Y] default(0:254):0.0.0.0/0: Processing rn 0x5566f5aafcd0
May 19 14:48:34 vyos zebra[4116]: [ZJVZ4-XEGPF] default(0:254):0.0.0.0/0: Examine re 0x5566f5c5d6f0 (kernel) status: Changed Installed flags: Selected dist 0 metric 0
May 19 14:48:34 vyos zebra[4116]: [VTD0C-NR53W] nexthop_active_update: re 0x5566f5c5d6f0 nhe 0x5566f5c5c520 (77[192.168.0.1 if 2]), curr_nhe 0x5566f5c5cfb0
May 19 14:48:34 vyos zebra[4116]: [T9JWA-N8HM5] nexthop_active_check: re 0x5566f5c5d6f0, nexthop 192.168.0.1, via eth0
May 19 14:48:34 vyos zebra[4116]: [YPVDZ-KQPM9]         nexthop_active: Matched against ourself and prefix length is not max bit length
May 19 14:48:34 vyos zebra[4116]: [HJ48M-MB610]         nexthop_active_check: Unable to find active nexthop
May 19 14:48:34 vyos zebra[4116]: [ZG85Y-SBJH3] nexthop_active_update: re 0x5566f5c5d6f0 curr_active 0
May 19 14:48:34 vyos zebra[4116]: [MA7PW-DZFV7] zebra_nhg_rib_find_nhe: rt_nhe 0x5566f5c5cfb0 (0[])
May 19 14:48:34 vyos zebra[4116]: [YDEFD-9VKZX] zebra_nhe_find: id 0, lookup 0x5566f5c5cfb0, vrf 0, type 28, depends 0x0
May 19 14:48:34 vyos zebra[4116]: [XCWSF-H8QAB] zebra_nhe_find: lookup => 0x0 ([NULL])
May 19 14:48:34 vyos zebra[4116]: [J7G3Y-G97PX] nhg_get_next_id: ID 78 checking
May 19 14:48:34 vyos zebra[4116]: [PKSQ5-QBSHW] zebra_nhe_find: => created 0x5566f5c5d510 (78[192.168.0.1 if 2])
May 19 14:48:34 vyos zebra[4116]: [W4Z4R-NTSMD] zebra_nhg_rib_find_nhe: => nhe 0x5566f5c5d510 (78[192.168.0.1 if 2])
May 19 14:48:34 vyos zebra[4116]: [J6P4K-BSDDN] nexthop_active_update: re 0x5566f5c5d6f0 CHANGED: nhe 0x5566f5c5c520 (77[192.168.0.1 if 2]) => new_nhe 0x5566f5c5d510 (78[192.168.0.1 if 2])
May 19 14:48:34 vyos zebra[4116]: [WDEB1-93HCZ] zebra_nhg_increment_ref: nhe 0x5566f5c5d510 (78[192.168.0.1 if 2]) 0 => 1
May 19 14:48:34 vyos zebra[4116]: [WDEB1-93HCZ] zebra_nhg_decrement_ref: nhe 0x5566f5c5c520 (77[192.168.0.1 if 2]) 1 => 0
May 19 14:48:34 vyos zebra[4116]: [VKWCR-QB19H] zebra_nhg_free: nhe 0x5566f5c5cfb0 (78[]), refcnt 0, NH 192.168.0.1, via eth0
May 19 14:48:34 vyos zebra[4116]: [TQKA8-0276P] Not Notifying Owner: kernel about prefix 0.0.0.0/0(254) 0 vrf: 0
May 19 14:48:34 vyos zebra[4116]: [JPJF4-TGCY5] default(0:254):0.0.0.0/0: After processing: old_selected 0x5566f5c5d6f0 new_selected 0x0 old_fib 0x5566f5c5d6f0 new_fib 0x0
May 19 14:48:34 vyos zebra[4116]: [S31W0-H281H] 0:2540.0.0.0/0: Redist del: re 0x5566f5c5d6f0 (0:kernel), new re 0x0 (0:None)
May 19 14:48:34 vyos zebra[4116]: [S59C1-4C5PP] default(0:254):0.0.0.0/0: Deleting route rn 0x5566f5aafcd0, re 0x5566f5c5d6f0 (kernel)
May 19 14:48:34 vyos zebra[4116]: [NM15X-X83N9] rib_process: (0:254):0.0.0.0/0: rn 0x5566f5aafcd0, removing re 0x5566f5c5d6f0
May 19 14:48:34 vyos zebra[4116]: [Y53JX-CBC5H] rib_unlink: (0:254):0.0.0.0/0: rn 0x5566f5aafcd0, re 0x5566f5c5d6f0
May 19 14:48:34 vyos zebra[4116]: [WDEB1-93HCZ] zebra_nhg_decrement_ref: nhe 0x5566f5c5d510 (78[192.168.0.1 if 2]) 1 => 0
May 19 14:48:34 vyos zebra[4116]: [JE46R-G2NEE] zebra_nhg_release: nhe 0x5566f5c5d510 (78[192.168.0.1 if 2])
May 19 14:48:34 vyos zebra[4116]: [VKWCR-QB19H] zebra_nhg_free: nhe 0x5566f5c5d510 (78[192.168.0.1 if 2]), refcnt 0, NH 192.168.0.1, via eth0
May 19 14:48:34 vyos zebra[4116]: [HH6N2-PDCJS] default(0:0):0.0.0.0/0 rn 0x5566f5aafcd0 dequeued from sub-queue Kernel Routes
May 19 14:48:34 vyos zebra[4116]: [TQKA8-0276P] Not Notifying Owner: connected about prefix 192.168.0.0/24(254) 3 vrf: 0
May 19 14:48:34 vyos zebra[4116]: [PS6CZ-GQDN1] zebra_rib_evaluate_rn_nexthops: 192.168.0.0/24 Being examined for Nexthop Tracking Count: 0
May 19 14:48:34 vyos zebra[4116]: [N744R-XB3J9] 192.168.0.0/24 has no tracking NHTs. Bailing
May 19 14:48:34 vyos zebra[4116]: [TQKA8-0276P] Not Notifying Owner: kernel about prefix 0.0.0.0/0(254) 3 vrf: 0
May 19 14:48:34 vyos zebra[4116]: [PS6CZ-GQDN1] zebra_rib_evaluate_rn_nexthops: 0.0.0.0/0 Being examined for Nexthop Tracking Count: 0
May 19 14:48:34 vyos zebra[4116]: [N744R-XB3J9] 0.0.0.0/0 has no tracking NHTs. Bailing

I think the check should be removed or modified to take into account more info than only interface status.

Screenshots

Versions

Additional context

It could be that described problem may be a reason also for:

https://github.com/FRRouting/frr/issues/11592 https://github.com/FRRouting/frr/issues/12197

SwimGeek commented 1 year ago

Hi. We see the same issue.

FRRouting 8.5.2, on Linux 6.1.35, Debian 12

We add/delete routes with "ip route". Sometimes FRR continues to advertise one of these kernel routes after it's been deleted, indefinitely. Nothing in FRR logs.

Example cpt-ter-rs2# ip route del 102.216.77.157/32

Route is gone from kernel: cpt-ter-rs2# ip route sh | grep '102.216.77' 102.216.77.0/25 dev br0.777 proto kernel scope link src 102.216.77.1 rt_offload blackhole 102.216.77.0/24 proto static metric 20 rt_offload 102.216.77.135 dev br0.300 scope link src 102.216.77.129 rt_offload

cpt-ter-rs2# ip route sh match 102.216.77.157/32 default via 196.250.236.145 dev br0.1576 proto kernel onlink offload rt_offload blackhole 102.216.77.0/24 proto static metric 20 rt_offload

But FRR is still seeing it: cpt-ter-rs2# show ip route 102.216.77.157 Routing entry for 102.216.77.157/32 Known via "kernel", distance 0, metric 0, best Last update 03:23:18 ago

cpt-ter-rs2# show ip bgp 102.216.77.157/32 BGP routing table entry for 102.216.77.157/32, version 22225 Paths: (1 available, best #1, table default) Advertised to non peer-group peers: cpt-ter-rs1(102.216.76.3) cpt-ter-r2(102.216.76.2) Local 0.0.0.0(cpt-ter-rs2) from 0.0.0.0 (102.216.76.4) Origin incomplete, metric 0, weight 32768, valid, sourced, best (First path received) Last update: Tue Jul 11 15:30:27 2023

rzalamena commented 1 year ago

These commits seem to explain that exception (from oldest to newest):

This behavior looks intentional to workaround interfaces going down bringing down other routes. Maybe we are missing stimulus somewhere else to remove these stale routes?

github-actions[bot] commented 6 months ago

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

vgrebenschikov commented 5 months ago

Similar to #9185 ?

sever-sever commented 5 months ago

Still, the bug. Are there any updates?

DennyAgussy commented 1 month ago

As per @zdc comment when we remove this check: if (ifp && (if_is_operative(ifp) || if_is_up(ifp))) { SET_FLAG(nexthop->flags, NEXTHOP_FLAG_ACTIVE); goto skip_check; } It works fine and FRR removes this default route: K>* 0.0.0.0/0 [0/0] via 192.168.0.1, eth0, 00:00:07 . The reason behind this is simple, though the interface is up and even though we don't have any connected ip addresses(after flush) "goto skip_check;" skips the intermediatory checks, so FRR isn't removing the default route is what my observation is. We need to add a check to see whether the interface has the connected routes.