FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.31k stars 1.25k forks source link

IPv6 routes not cleaned up properly on BGP session shutdown #12723

Closed jhaprins closed 1 year ago

jhaprins commented 1 year ago

During some maintenance windows I noticed some packet loss on IPv6 data traffic. When investigating this issues I noticed that IPv6 routes in the kernel that were injected by FRR from an BGP session to a transit partner were not cleaned properly in the routing table when the BGP session was being shut.

Example:

` Neighbor V AS MsgRcvd MsgSent TblVer InQ OutQ Up/Down State/PfxRcd PfxSnt Desc 2a02:10:0:1::181:1 4 24785 5977015 9805 0 0 0 00:31:50 Idle (Admin) 0 Joint Transit NIKHEF 2a02:10:0:1::181:2 4 24785 5713580 9835 0 0 0 6d19h47m 162181 3 Joint Transit Equini

bgp01.as48972.net# sh bgp ipv6 2c0f:fc89:8089::/48 BGP routing table entry for 2c0f:fc89:8089::/48, version 27902185 Paths: (4 available, best #4, table default) Advertised to non peer-group peers: 2a02:b70:0:503::3 2a02:b70:0:503::4 2a02:b70:0:503::6 2a02:b70:0:503::7 24785 6830 6453 9498 8452 36992 2a02:10:0:1::181:1 from 2a02:10:0:1::181:1 (213.207.0.225) (fe80::8aa2:5e04:2e18:3c00) (used) Origin IGP, metric 0, localpref 90, valid, external, multipath Community: 24785:6830 Last update: Wed Feb 1 11:07:36 2023 20847 3356 6453 9498 8452 36992 2001:1690:6::2:d (metric 20) from 2a02:b70:0:503::3 (185.100.140.2) Origin IGP, localpref 90, valid, internal Last update: Sat Jan 28 21:00:56 2023 20847 286 6453 9498 8452 36992 2001:1690:6::2:11 (metric 20) from 2a02:b70:0:503::6 (185.100.140.3) Origin IGP, localpref 90, valid, internal Last update: Sat Jan 28 21:00:36 2023 24785 6830 6453 9498 8452 36992 2a02:10:0:1::181:2 from 2a02:10:0:1::181:2 (213.207.0.226) (fe80::8aa2:5e04:2e18:e400) (used) Origin IGP, metric 0, localpref 90, valid, external, multipath, best (Peer Type) Community: 24785:6830 Last update: Sat Jan 28 21:00:17 2023

bgp01.as48972.net# sh ipv6 route 2c0f:fc89:8089::/48 Routing entry for 2c0f:fc89:8089::/48 Known via "bgp", distance 20, metric 0, best Last update 00:00:24 ago

[root@bgp01 frr]# ip -6 ro show 2c0f:fc89:8089::/48 2c0f:fc89:8089::/48 via fe80::8aa2:5e04:2e18:3c00 dev vlan1070 proto 186 metric 20 pref medium 2c0f:fc89:8089::/48 via fe80::8aa2:5e04:2e18:e400 dev vlan1070 proto 186 metric 20 pref medium

`

When I shut the BGP session to 2a02:10:0:1::181:1 I would expect the relevant paths to be removed from the routing table. This would essentially mean that I should not have any routes via fe80::8aa2:5e04:2e18:3c00 which is the LinkLocal address of that router.

[root@bgp01 frr]# ip -6 ro show via fe80::8aa2:5e04:2e18:3c00 |wc -l 5806

This is clearly not the case.

The debug logging that I created when I did a shut of the BGP session, looking at this specific prefix looks like this:

2023/02/01 10:35:28 ZEBRA: [MFYWV-KH3MC] rib_add_multipath_nhe: (0:254):2c0f:fc89:8089::/48: Inserting route rn 0xa9db390, re 0x24c9c9a0 (bgp) existing 0x13839270, same_count 1 2023/02/01 10:35:28 ZEBRA: [Q4T2G-E2SQF] rib_add_multipath_nhe: dumping RE entry 0x24c9c9a0 for 2c0f:fc89:8089::/48 vrf default(0) 2023/02/01 10:35:28 ZEBRA: [M5M58-9PD2R] 2c0f:fc89:8089::/48: uptime == 2025402, type == 9, instance == 0, table == 254 2023/02/01 10:35:28 ZEBRA: [RVZMM-N7DME] 2c0f:fc89:8089::/48: metric == 0, mtu == 0, distance == 20, flags == None status == None 2023/02/01 10:35:28 ZEBRA: [Q1NW5-NWY7P] 2c0f:fc89:8089::/48: nexthop_num == 1, nexthop_active_num == 0 2023/02/01 10:35:28 ZEBRA: [TFHQ8-TC30H] 2c0f:fc89:8089::/48: NH fe80::8aa2:5e04:2e18:e400[15] vrf default(0) wgt 1, with flags 2023/02/01 10:35:28 ZEBRA: [SCETK-GQ9E4] 2c0f:fc89:8089::/48: dump complete 2023/02/01 10:35:28 ZEBRA: [QEVVE-G3FQQ] rib_meta_queue_add: (0:254):2c0f:fc89:8089::/48: queued rn 0xa9db390 into sub-queue 6 2023/02/01 10:35:28 ZEBRA: [QE6V0-J8BG5] rib_delnode: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390, re 0x13839270, removing 2023/02/01 10:35:28 ZEBRA: [Q7ZRR-C2A44] rib_meta_queue_add: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390 is already queued in sub-queue 6 2023/02/01 10:35:30 ZEBRA: [NZNZ4-7P54Y] default(0:254):2c0f:fc89:8089::/48: Processing rn 0xa9db390 2023/02/01 10:35:30 ZEBRA: [ZJVZ4-XEGPF] default(0:254):2c0f:fc89:8089::/48: Examine re 0x24c9c9a0 (bgp) status: Changed flags: None dist 20 metric 0 2023/02/01 10:35:30 ZEBRA: [ZJVZ4-XEGPF] default(0:254):2c0f:fc89:8089::/48: Examine re 0x13839270 (bgp) status: Removed Installed flags: Selected dist 20 metric 0 2023/02/01 10:35:30 ZEBRA: [JPJF4-TGCY5] default(0:254):2c0f:fc89:8089::/48: After processing: old_selected 0x13839270 new_selected 0x24c9c9a0 old_fib 0x13839270 new_fib 0x24c9c9a0 2023/02/01 10:35:30 ZEBRA: [S31W0-H281H] 0:2542c0f:fc89:8089::/48: Redist del: re 0x13839270 (0:bgp), new re 0x24c9c9a0 (0:bgp) 2023/02/01 10:35:30 ZEBRA: [XEW4Y-SPDDE] default(0:254):2c0f:fc89:8089::/48: Updating route rn 0xa9db390, re 0x24c9c9a0 (bgp) old 0x13839270 (bgp) 2023/02/01 10:35:30 ZEBRA: [NM15X-X83N9] rib_process: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390, removing re 0x13839270 2023/02/01 10:35:30 ZEBRA: [Y53JX-CBC5H] rib_unlink: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390, re 0x13839270 2023/02/01 10:35:30 ZEBRA: [QZ1V6-CRT8D] default(0:254):2c0f:fc89:8089::/48 rn 0xa9db390 dequeued from sub-queue 6 2023/02/01 10:35:30 ZEBRA: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_DELROUTE 2c0f:fc89:8089::/48 vrf 0(254) 2023/02/01 10:35:30 ZEBRA: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_NEWROUTE 2c0f:fc89:8089::/48 vrf 0(254) 2023/02/01 10:35:30 ZEBRA: [TVM3E-A8ZAG] _netlink_route_build_singlepath: (single-path): 2c0f:fc89:8089::/48 nexthop via fe80::8aa2:5e04:2e18:e400 if 15 vrf default(0) 2023/02/01 10:35:30 ZEBRA: [GHWHS-ZKQM5] update_from_ctx: default(0:254):2c0f:fc89:8089::/48: SELECTED, re 0x24c9c9a0 2023/02/01 10:35:30 ZEBRA: [TS3SH-1276M] default(0:254):2c0f:fc89:8089::/48 update_from_ctx(): no fib nhg 2023/02/01 10:35:30 ZEBRA: [HKQXC-4STSK] default(0:254):2c0f:fc89:8089::/48 update_from_ctx(): rib nhg matched, changed 'false' 2023/02/01 10:35:30 ZEBRA: [Z1MP1-RFGJA] (0:254):2c0f:fc89:8089::/48(0): Redist update re 0x24c9c9a0 (bgp), old 0x24c9c9a0 (bgp)

The second time I reproduced this issue, above route was obviously cleaned up just fine, but during the initial debugging I collected the following results when the session to the transit router was shut:

`[root@bgp01 frr]# ip -6 ro show 2c0f:fc89:8089::/48 2c0f:fc89:8089::/48 via fe80::225:90ff:fee5:666e dev 10ge2 proto 186 metric 20 pref medium 2c0f:fc89:8089::/48 via fe80::8aa2:5e04:2e18:3c00 dev vlan1070 proto 186 metric 20 pref medium 2c0f:fc89:8089::/48 via fe80::8aa2:5e04:2e18:e400 dev vlan1070 proto 186 metric 20 pref medium [root@bgp01 frr]#

bgp01.as48972.net# sh bgp ipv6 2c0f:fc89:8089::/48 BGP routing table entry for 2c0f:fc89:8089::/48, version 27834269 Paths: (3 available, best #3, table default) Advertised to non peer-group peers: 2a02:b70:0:503::3 2a02:b70:0:503::4 2a02:b70:0:503::6 2a02:b70:0:503::7 20847 3356 6453 9498 8452 36992 2001:1690:6::2:d (metric 20) from 2a02:b70:0:503::3 (185.100.140.2) Origin IGP, localpref 90, valid, internal Last update: Sat Jan 28 21:00:56 2023 20847 286 6453 9498 8452 36992 2001:1690:6::2:11 (metric 20) from 2a02:b70:0:503::6 (185.100.140.3) Origin IGP, localpref 90, valid, internal Last update: Sat Jan 28 21:00:36 2023 24785 6830 6453 9498 8452 36992 2a02:10:0:1::181:2 from 2a02:10:0:1::181:2 (213.207.0.226) (fe80::8aa2:5e04:2e18:e400) (used) Origin IGP, metric 0, localpref 90, valid, external, best (Peer Type) Community: 24785:6830 Last update: Sat Jan 28 21:00:17 2023

bgp01.as48972.net# sh ipv6 route 2c0f:fc89:8089::/48 Routing entry for 2c0f:fc89:8089::/48 Known via "bgp", distance 20, metric 0, best Last update 00:11:56 ago

jhaprins commented 1 year ago

Below are 3 pieces of debug logging for this prefix:

[root@bgp01 frr]# grep "2c0f:fc89:8089::/48" bgpd.log.3 2023/02/01 10:35:28 ZEBRA: [MFYWV-KH3MC] rib_add_multipath_nhe: (0:254):2c0f:fc89:8089::/48: Inserting route rn 0xa9db390, re 0x24c9c9a0 (bgp) existing 0x13839270, same_count 1 2023/02/01 10:35:28 ZEBRA: [Q4T2G-E2SQF] rib_add_multipath_nhe: dumping RE entry 0x24c9c9a0 for 2c0f:fc89:8089::/48 vrf default(0) 2023/02/01 10:35:28 ZEBRA: [M5M58-9PD2R] 2c0f:fc89:8089::/48: uptime == 2025402, type == 9, instance == 0, table == 254 2023/02/01 10:35:28 ZEBRA: [RVZMM-N7DME] 2c0f:fc89:8089::/48: metric == 0, mtu == 0, distance == 20, flags == None status == None 2023/02/01 10:35:28 ZEBRA: [Q1NW5-NWY7P] 2c0f:fc89:8089::/48: nexthop_num == 1, nexthop_active_num == 0 2023/02/01 10:35:28 ZEBRA: [TFHQ8-TC30H] 2c0f:fc89:8089::/48: NH fe80::8aa2:5e04:2e18:e400[15] vrf default(0) wgt 1, with flags 2023/02/01 10:35:28 ZEBRA: [SCETK-GQ9E4] 2c0f:fc89:8089::/48: dump complete 2023/02/01 10:35:28 ZEBRA: [QEVVE-G3FQQ] rib_meta_queue_add: (0:254):2c0f:fc89:8089::/48: queued rn 0xa9db390 into sub-queue 6 2023/02/01 10:35:28 ZEBRA: [QE6V0-J8BG5] rib_delnode: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390, re 0x13839270, removing 2023/02/01 10:35:28 ZEBRA: [Q7ZRR-C2A44] rib_meta_queue_add: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390 is already queued in sub-queue 6 2023/02/01 10:35:30 ZEBRA: [NZNZ4-7P54Y] default(0:254):2c0f:fc89:8089::/48: Processing rn 0xa9db390 2023/02/01 10:35:30 ZEBRA: [ZJVZ4-XEGPF] default(0:254):2c0f:fc89:8089::/48: Examine re 0x24c9c9a0 (bgp) status: Changed flags: None dist 20 metric 0 2023/02/01 10:35:30 ZEBRA: [ZJVZ4-XEGPF] default(0:254):2c0f:fc89:8089::/48: Examine re 0x13839270 (bgp) status: Removed Installed flags: Selected dist 20 metric 0 2023/02/01 10:35:30 ZEBRA: [JPJF4-TGCY5] default(0:254):2c0f:fc89:8089::/48: After processing: old_selected 0x13839270 new_selected 0x24c9c9a0 old_fib 0x13839270 new_fib 0x24c9c9a0 2023/02/01 10:35:30 ZEBRA: [S31W0-H281H] 0:2542c0f:fc89:8089::/48: Redist del: re 0x13839270 (0:bgp), new re 0x24c9c9a0 (0:bgp) 2023/02/01 10:35:30 ZEBRA: [XEW4Y-SPDDE] default(0:254):2c0f:fc89:8089::/48: Updating route rn 0xa9db390, re 0x24c9c9a0 (bgp) old 0x13839270 (bgp) 2023/02/01 10:35:30 ZEBRA: [NM15X-X83N9] rib_process: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390, removing re 0x13839270 2023/02/01 10:35:30 ZEBRA: [Y53JX-CBC5H] rib_unlink: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390, re 0x13839270 2023/02/01 10:35:30 ZEBRA: [QZ1V6-CRT8D] default(0:254):2c0f:fc89:8089::/48 rn 0xa9db390 dequeued from sub-queue 6 2023/02/01 10:35:30 ZEBRA: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_DELROUTE 2c0f:fc89:8089::/48 vrf 0(254) 2023/02/01 10:35:30 ZEBRA: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_NEWROUTE 2c0f:fc89:8089::/48 vrf 0(254) 2023/02/01 10:35:30 ZEBRA: [TVM3E-A8ZAG] _netlink_route_build_singlepath: (single-path): 2c0f:fc89:8089::/48 nexthop via fe80::8aa2:5e04:2e18:e400 if 15 vrf default(0) 2023/02/01 10:35:30 ZEBRA: [GHWHS-ZKQM5] update_from_ctx: default(0:254):2c0f:fc89:8089::/48: SELECTED, re 0x24c9c9a0 2023/02/01 10:35:30 ZEBRA: [TS3SH-1276M] default(0:254):2c0f:fc89:8089::/48 update_from_ctx(): no fib nhg 2023/02/01 10:35:30 ZEBRA: [HKQXC-4STSK] default(0:254):2c0f:fc89:8089::/48 update_from_ctx(): rib nhg matched, changed 'false' 2023/02/01 10:35:30 ZEBRA: [Z1MP1-RFGJA] (0:254):2c0f:fc89:8089::/48(0): Redist update re 0x24c9c9a0 (bgp), old 0x24c9c9a0 (bgp)

2023/02/01 11:07:37 ZEBRA: [MFYWV-KH3MC] rib_add_multipath_nhe: (0:254):2c0f:fc89:8089::/48: Inserting route rn 0xa9db390, re 0x21cb0900 (bgp) existing 0x24c9c9a0, same_count 1 2023/02/01 11:07:37 ZEBRA: [Q4T2G-E2SQF] rib_add_multipath_nhe: dumping RE entry 0x21cb0900 for 2c0f:fc89:8089::/48 vrf default(0) 2023/02/01 11:07:37 ZEBRA: [M5M58-9PD2R] 2c0f:fc89:8089::/48: uptime == 2027330, type == 9, instance == 0, table == 254 2023/02/01 11:07:37 ZEBRA: [RVZMM-N7DME] 2c0f:fc89:8089::/48: metric == 0, mtu == 0, distance == 20, flags == None status == None 2023/02/01 11:07:37 ZEBRA: [Q1NW5-NWY7P] 2c0f:fc89:8089::/48: nexthop_num == 2, nexthop_active_num == 0 2023/02/01 11:07:37 ZEBRA: [TFHQ8-TC30H] 2c0f:fc89:8089::/48: NH fe80::8aa2:5e04:2e18:3c00[15] vrf default(0) wgt 1, with flags 2023/02/01 11:07:37 ZEBRA: [TFHQ8-TC30H] 2c0f:fc89:8089::/48: NH fe80::8aa2:5e04:2e18:e400[15] vrf default(0) wgt 1, with flags 2023/02/01 11:07:37 ZEBRA: [SCETK-GQ9E4] 2c0f:fc89:8089::/48: dump complete 2023/02/01 11:07:37 ZEBRA: [QEVVE-G3FQQ] rib_meta_queue_add: (0:254):2c0f:fc89:8089::/48: queued rn 0xa9db390 into sub-queue 6 2023/02/01 11:07:37 ZEBRA: [QE6V0-J8BG5] rib_delnode: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390, re 0x24c9c9a0, removing 2023/02/01 11:07:37 ZEBRA: [Q7ZRR-C2A44] rib_meta_queue_add: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390 is already queued in sub-queue 6 2023/02/01 11:07:38 ZEBRA: [NZNZ4-7P54Y] default(0:254):2c0f:fc89:8089::/48: Processing rn 0xa9db390 2023/02/01 11:07:38 ZEBRA: [ZJVZ4-XEGPF] default(0:254):2c0f:fc89:8089::/48: Examine re 0x21cb0900 (bgp) status: Changed flags: None dist 20 metric 0 2023/02/01 11:07:38 ZEBRA: [ZJVZ4-XEGPF] default(0:254):2c0f:fc89:8089::/48: Examine re 0x24c9c9a0 (bgp) status: Removed Installed flags: Selected dist 20 metric 0 2023/02/01 11:07:38 ZEBRA: [JPJF4-TGCY5] default(0:254):2c0f:fc89:8089::/48: After processing: old_selected 0x24c9c9a0 new_selected 0x21cb0900 old_fib 0x24c9c9a0 new_fib 0x21cb0900 2023/02/01 11:07:38 ZEBRA: [S31W0-H281H] 0:2542c0f:fc89:8089::/48: Redist del: re 0x24c9c9a0 (0:bgp), new re 0x21cb0900 (0:bgp) 2023/02/01 11:07:38 ZEBRA: [XEW4Y-SPDDE] default(0:254):2c0f:fc89:8089::/48: Updating route rn 0xa9db390, re 0x21cb0900 (bgp) old 0x24c9c9a0 (bgp) 2023/02/01 11:07:38 ZEBRA: [NM15X-X83N9] rib_process: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390, removing re 0x24c9c9a0 2023/02/01 11:07:38 ZEBRA: [Y53JX-CBC5H] rib_unlink: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390, re 0x24c9c9a0 2023/02/01 11:07:38 ZEBRA: [QZ1V6-CRT8D] default(0:254):2c0f:fc89:8089::/48 rn 0xa9db390 dequeued from sub-queue 6 2023/02/01 11:07:38 ZEBRA: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_DELROUTE 2c0f:fc89:8089::/48 vrf 0(254) 2023/02/01 11:07:38 ZEBRA: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_NEWROUTE 2c0f:fc89:8089::/48 vrf 0(254) 2023/02/01 11:07:38 ZEBRA: [WY6GM-8HKW9] _netlink_route_build_multipath: (multipath): 2c0f:fc89:8089::/48 nexthop via fe80::8aa2:5e04:2e18:3c00 if 15 vrf default(0) 2023/02/01 11:07:38 ZEBRA: [WY6GM-8HKW9] _netlink_route_build_multipath: (multipath): 2c0f:fc89:8089::/48 nexthop via fe80::8aa2:5e04:2e18:e400 if 15 vrf default(0) 2023/02/01 11:07:38 ZEBRA: [GHWHS-ZKQM5] update_from_ctx: default(0:254):2c0f:fc89:8089::/48: SELECTED, re 0x21cb0900 2023/02/01 11:07:38 ZEBRA: [TS3SH-1276M] default(0:254):2c0f:fc89:8089::/48 update_from_ctx(): no fib nhg 2023/02/01 11:07:38 ZEBRA: [HKQXC-4STSK] default(0:254):2c0f:fc89:8089::/48 update_from_ctx(): rib nhg matched, changed 'false' 2023/02/01 11:07:38 ZEBRA: [Z1MP1-RFGJA] (0:254):2c0f:fc89:8089::/48(0): Redist update re 0x21cb0900 (bgp), old 0x21cb0900 (bgp)

2023/02/01 11:17:48 ZEBRA: [MFYWV-KH3MC] rib_add_multipath_nhe: (0:254):2c0f:fc89:8089::/48: Inserting route rn 0xa9db390, re 0x236391e0 (bgp) existing 0x21cb0900, same_count 1 2023/02/01 11:17:48 ZEBRA: [Q4T2G-E2SQF] rib_add_multipath_nhe: dumping RE entry 0x236391e0 for 2c0f:fc89:8089::/48 vrf default(0) 2023/02/01 11:17:48 ZEBRA: [M5M58-9PD2R] 2c0f:fc89:8089::/48: uptime == 2027941, type == 9, instance == 0, table == 254 2023/02/01 11:17:48 ZEBRA: [RVZMM-N7DME] 2c0f:fc89:8089::/48: metric == 0, mtu == 0, distance == 20, flags == None status == None 2023/02/01 11:17:48 ZEBRA: [Q1NW5-NWY7P] 2c0f:fc89:8089::/48: nexthop_num == 1, nexthop_active_num == 0 2023/02/01 11:17:48 ZEBRA: [TFHQ8-TC30H] 2c0f:fc89:8089::/48: NH fe80::8aa2:5e04:2e18:e400[15] vrf default(0) wgt 1, with flags 2023/02/01 11:17:48 ZEBRA: [SCETK-GQ9E4] 2c0f:fc89:8089::/48: dump complete 2023/02/01 11:17:48 ZEBRA: [QEVVE-G3FQQ] rib_meta_queue_add: (0:254):2c0f:fc89:8089::/48: queued rn 0xa9db390 into sub-queue 6 2023/02/01 11:17:48 ZEBRA: [QE6V0-J8BG5] rib_delnode: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390, re 0x21cb0900, removing 2023/02/01 11:17:48 ZEBRA: [Q7ZRR-C2A44] rib_meta_queue_add: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390 is already queued in sub-queue 6 2023/02/01 11:17:50 ZEBRA: [NZNZ4-7P54Y] default(0:254):2c0f:fc89:8089::/48: Processing rn 0xa9db390 2023/02/01 11:17:50 ZEBRA: [ZJVZ4-XEGPF] default(0:254):2c0f:fc89:8089::/48: Examine re 0x236391e0 (bgp) status: Changed flags: None dist 20 metric 0 2023/02/01 11:17:50 ZEBRA: [ZJVZ4-XEGPF] default(0:254):2c0f:fc89:8089::/48: Examine re 0x21cb0900 (bgp) status: Removed Installed flags: Selected dist 20 metric 0 2023/02/01 11:17:50 ZEBRA: [JPJF4-TGCY5] default(0:254):2c0f:fc89:8089::/48: After processing: old_selected 0x21cb0900 new_selected 0x236391e0 old_fib 0x21cb0900 new_fib 0x236391e0 2023/02/01 11:17:50 ZEBRA: [S31W0-H281H] 0:2542c0f:fc89:8089::/48: Redist del: re 0x21cb0900 (0:bgp), new re 0x236391e0 (0:bgp) 2023/02/01 11:17:50 ZEBRA: [XEW4Y-SPDDE] default(0:254):2c0f:fc89:8089::/48: Updating route rn 0xa9db390, re 0x236391e0 (bgp) old 0x21cb0900 (bgp) 2023/02/01 11:17:50 ZEBRA: [NM15X-X83N9] rib_process: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390, removing re 0x21cb0900 2023/02/01 11:17:50 ZEBRA: [Y53JX-CBC5H] rib_unlink: (0:254):2c0f:fc89:8089::/48: rn 0xa9db390, re 0x21cb0900 2023/02/01 11:17:50 ZEBRA: [QZ1V6-CRT8D] default(0:254):2c0f:fc89:8089::/48 rn 0xa9db390 dequeued from sub-queue 6 2023/02/01 11:17:50 ZEBRA: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_DELROUTE 2c0f:fc89:8089::/48 vrf 0(254) 2023/02/01 11:17:50 ZEBRA: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_NEWROUTE 2c0f:fc89:8089::/48 vrf 0(254) 2023/02/01 11:17:50 ZEBRA: [TVM3E-A8ZAG] _netlink_route_build_singlepath: (single-path): 2c0f:fc89:8089::/48 nexthop via fe80::8aa2:5e04:2e18:e400 if 15 vrf default(0) 2023/02/01 11:17:50 ZEBRA: [GHWHS-ZKQM5] update_from_ctx: default(0:254):2c0f:fc89:8089::/48: SELECTED, re 0x236391e0 2023/02/01 11:17:50 ZEBRA: [TS3SH-1276M] default(0:254):2c0f:fc89:8089::/48 update_from_ctx(): no fib nhg 2023/02/01 11:17:50 ZEBRA: [HKQXC-4STSK] default(0:254):2c0f:fc89:8089::/48 update_from_ctx(): rib nhg matched, changed 'false' 2023/02/01 11:17:50 ZEBRA: [Z1MP1-RFGJA] (0:254):2c0f:fc89:8089::/48(0): Redist update re 0x236391e0 (bgp), old 0x236391e0 (bgp) [root@bgp01 frr]#

jhaprins commented 1 year ago

The kernel config attached.[root@bgp01 boot]# uname -a Linux bgp01.as48972.net 3.10.0-1160.42.2.el7.x86_64 #1 SMP Tue Sep 7 14:49:57 UTC 2021 x86_64 x86_64 x86_64 GNU/Linux

kernel-config.txt

donaldsharp commented 1 year ago

From talks in slack. This is a bug in the linux kernel being used of 3.10. FRR's ecmp route add is being missinterprted by the kernel as 2 routes. As such when FRR sends a delete one is matched and the other is left. Upgrading the kernel or running the daemons with -e 1 will solve the problem