Open alisenkov opened 1 year ago
BUT, when we shutdown/recable interface eno1, we seems that FRR cant push same routes(default) to kernel, kernel lost default route and server becomes inaccessible for production traffic.
You mean disabling OOB link, default routes disappear from the kernel and are not reinstalled back again until you restart FRR?
You mean disabling OOB link, default routes disappear from the kernel and are not reinstalled back again until you restart FRR?
yes, default ecmp route from bgp with ipv6 nexthop reinstalled back only when restart FRR.
we found interest issue - https://www.spinics.net/lists/netdev/msg863121.html
> > > > > when an IPv4 route gets removed because its nexthop was deleted, the
> > > > > kernel does not send a RTM_DELROUTE netlink notifications anymore in
> > > > > 6.1. A bisect lead me to 61b91eb33a69 ("ipv4: Handle attempt to delete
> > > > > multipath route when fib_info contains an nh reference"), and
> > > > > reverting it makes it work again.
Can you show the full ip route
output, also show ip route
, show ip bgp
, because I don't see any interfaces, routes related to eno1 interface.
Can you show the full
ip route
output, alsoshow ip route
,show ip bgp
, because I don't see any interfaces, routes related to eno1 interface.s1# sh ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP, T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR, f - OpenFabric, - selected route, * - FIB route, q - queued, r - rejected, b - backup t - trapped, o - offload failure
B>* 0.0.0.0/0 [20/0] via fe80::c2d6:82ff:fef4:3cc4, ens1f1np1, weight 1, 01w5d10h
via fe80::c2d6:82ff:fef4:428c, ens1f0np0, weight 1, 01w5d10h K> 10.4.0.0/16 [0/100] via 10.177.1.254, eno1, src 10.177.1.31, 05w0d16h C> 10.177.1.0/24 is directly connected, eno1, 05w0d16h <<<<<<<<< C> 192.168.100.34/32 is directly connected, lo, 05w5d18h C> 192.168.100.51/32 is directly connected, lo, 05w5d18h C> 192.168.100.54/32 is directly connected, lo, 05w5d18h K> 198.18.0.0/15 [0/100] via 10.177.1.254, eno1, src 10.177.1.31, 05w0d16h s1# sh ip bgp BGP table version is 10, local router ID is 192.168.100.34, vrf id 0 Default local pref 100, local AS 65119 Status codes: s suppressed, d damped, h history, * valid, > best, = multipath, i internal, r RIB-failure, S Stale, R Removed Nexthop codes: @NNN nexthop's vrf id, < announce-nh-self Origin codes: i - IGP, e - EGP, ? - incomplete RPKI validation codes: V valid, I invalid, N Not found
Network Next Hop Metric LocPrf Weight Path = 0.0.0.0/0 ens1f0np0 0 65110 ? > ens1f1np1 0 65110 ? > 192.168.100.34/32 0.0.0.0 0 32768 i > 192.168.100.51/32 0.0.0.0 0 32768 i *> 192.168.100.54/32 0.0.0.0 0 32768 i
Displayed 4 routes and 5 total paths s1# exit root@s1:/home/alisenkov# ip ro default nhid 68 via inet6 fe80::c2d6:82ff:fef4:3cc4 dev ens1f1np1 proto bgp metric 20. <<<<< its also strange because FRR has 2 ECMP routes to default, but kernel has only one 10.4.0.0/16 via 10.177.1.254 dev eno1 proto dhcp src 10.177.1.31 metric 100 10.177.1.0/24 dev eno1 proto kernel scope link src 10.177.1.31 172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown 198.18.0.0/15 via 10.177.1.254 dev eno1 proto dhcp src 10.177.1.31 metric 100 root@s1:/home/alisenkov#
root@s1:/home/alisenkov# ip nexthop id 7 dev lo scope host proto zebra id 10 dev ens1f0np0 scope link proto zebra id 27 dev eno1 scope host proto zebra id 28 via 10.177.1.254 dev eno1 scope link proto zebra id 35 via fe80::c2d6:82ff:fef4:3cc4 dev ens1f1np1 scope link proto zebra id 51 via fe80::c2d6:82ff:fef4:428c dev ens1f0np0 scope link proto zebra id 68 group 35 proto zebra root@s1:/home/alisenkov#
I'm seeing the same issue. To me it looks like zebra is reinstalling the nexthop groups, but the routes that were using the nhgs are not getting restored.
Some more possibly interesting notes:
It is a systemd-networkd issue at the first place. It removes the nexthop groups unconditionally. IDK which version introduced this bug but it is present in v252 and (very likely) newer versions. Please check: https://github.com/systemd/systemd/issues/29034
May have fixed it:https://github.com/FRRouting/frr/pull/14080
@alisenkov The problem seems to be related to systemd-networkd. What is the normal output of your networkctl
? Please, check how your case relates to this scenario: https://github.com/systemd/systemd/issues/29034#issuecomment-1834155593
@huangfeilong1 Seems completely unrelated.
May have fixed it:#14080
Thanks, this fixed it for us (FRR 8.5 on Ubuntu 20.04)!
Interestingly, we don't face that problem with the system packages of systemd and frr on 22.04.
This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose
label in order to avoid having this issue closed.
Hello everyone, we have problem with ECMP routes on Ubuntu 22.04, we tried FRR version 8.5, 7.1, 9.1 - problem reproduced on all versions.
config FRR:
The server has two 10GE-interfaces (enp59s0, enp59s0d1) for transmit production traffic and one(eno1) for OOB(management). We use IPV6 ND for links between Server and L3-SW, on this link establish EBGP-peering.
When normal situation - FRR and kernel have two ECMP default route to EBGP-neighbors:
And this multipath default route correct install to kernel:
BUT, when we shutdown/recable interface eno1, we seems that FRR cant push same routes(default) to kernel, kernel lost default route and server becomes inaccessible for production traffic.
important point - If we do not use multipath, only one interface - after flap route correct installed to kernel, PROBLEM ACTUAL WHEN MULTIPATH USE.
also helps "service frr restart" - after restarting FRR can correct install ECMP default route to kernel:
logs:
Aug 08 13:00:14 front3 zebra[5697]: [X5XE1-RS0SW][EC 4043309074] Failed to install Nexthop (21[if 2 vrfid 0]) into the kernel
Click to expand
2023/08/08 12:12:45 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (25[if 6 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 12:12:45 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (25[if 6 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 12:12:45 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (33[if 2 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 12:12:45 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (38[if 2 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 12:12:45 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (37[if 2 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 12:12:45 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (25[if 6 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 12:12:45 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (33[if 2 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 12:12:45 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (38[if 2 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 12:12:45 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (37[if 2 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 12:15:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (36[37/38]) that we are still using for a route, sending it back down 2023/08/08 13:00:14 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (22[23/24]) that we are still using for a route, sending it back down 2023/08/08 13:07:07 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (16[17/18]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (51[fe80::c2d6:82ff:fef4:428c if 7 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel updated a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (51[fe80::c2d6:82ff:fef4:428c if 7 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel updated a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (50[fe80::c2d6:82ff:fef4:3cc4 if 6 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel updated a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (48[if 6 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (50[fe80::c2d6:82ff:fef4:3cc4 if 6 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel updated a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (48[if 6 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (43[if 1 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (51[fe80::c2d6:82ff:fef4:428c if 7 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 13:50:02 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel updated a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:03 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (50[fe80::c2d6:82ff:fef4:3cc4 if 6 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 13:50:03 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel updated a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down 2023/08/08 13:50:03 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (48[if 6 vrfid 0]) that we are still using for a route, sending it back down 2023/08/08 13:50:03 ZEBRA: [RG2NH-FTSDH][EC 4043309102] Kernel deleted a nexthop group with ID (49[50/51]) that we are still using for a route, sending it back down WhateverFor some reason FRR is unable to set the route to the kernel, please help me to understand the reason...