Open zdc opened 1 year ago
Hi. We see the same issue.
FRRouting 8.5.2, on Linux 6.1.35, Debian 12
We add/delete routes with "ip route". Sometimes FRR continues to advertise one of these kernel routes after it's been deleted, indefinitely. Nothing in FRR logs.
Example cpt-ter-rs2# ip route del 102.216.77.157/32
Route is gone from kernel: cpt-ter-rs2# ip route sh | grep '102.216.77' 102.216.77.0/25 dev br0.777 proto kernel scope link src 102.216.77.1 rt_offload blackhole 102.216.77.0/24 proto static metric 20 rt_offload 102.216.77.135 dev br0.300 scope link src 102.216.77.129 rt_offload
cpt-ter-rs2# ip route sh match 102.216.77.157/32 default via 196.250.236.145 dev br0.1576 proto kernel onlink offload rt_offload blackhole 102.216.77.0/24 proto static metric 20 rt_offload
But FRR is still seeing it: cpt-ter-rs2# show ip route 102.216.77.157 Routing entry for 102.216.77.157/32 Known via "kernel", distance 0, metric 0, best Last update 03:23:18 ago
cpt-ter-rs2# show ip bgp 102.216.77.157/32 BGP routing table entry for 102.216.77.157/32, version 22225 Paths: (1 available, best #1, table default) Advertised to non peer-group peers: cpt-ter-rs1(102.216.76.3) cpt-ter-r2(102.216.76.2) Local 0.0.0.0(cpt-ter-rs2) from 0.0.0.0 (102.216.76.4) Origin incomplete, metric 0, weight 32768, valid, sourced, best (First path received) Last update: Tue Jul 11 15:30:27 2023
These commits seem to explain that exception (from oldest to newest):
This behavior looks intentional to workaround interfaces going down bringing down other routes. Maybe we are missing stimulus somewhere else to remove these stale routes?
This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose
label in order to avoid having this issue closed.
Similar to #9185 ?
Still, the bug. Are there any updates?
As per @zdc comment when we remove this check: if (ifp && (if_is_operative(ifp) || if_is_up(ifp))) { SET_FLAG(nexthop->flags, NEXTHOP_FLAG_ACTIVE); goto skip_check; } It works fine and FRR removes this default route: K>* 0.0.0.0/0 [0/0] via 192.168.0.1, eth0, 00:00:07 . The reason behind this is simple, though the interface is up and even though we don't have any connected ip addresses(after flush) "goto skip_check;" skips the intermediatory checks, so FRR isn't removing the default route is what my observation is. We need to add a check to see whether the interface has the connected routes.
Describe the bug
In certain conditions, kernel routes are not properly updated in zebra, for example when an IP address is flushed from an interface.
To Reproduce
Reproducing is really simple:
After this, the kernel will have no routes, but FRR still has an active default route:
Logs after
ip -4 addr flush dev eth0
:Expected behavior
FRR should remove a route, if it does not exist in a kernel anymore.
I traced the problem till the
nexthop_active_check()
in https://github.com/FRRouting/frr/blob/32b20e1ad65e8db2ef80dd39b255f34de2802cd2/zebra/zebra_nhg.c#L2565.I think the problem is that kernel routes are trusted too much. This check does not seem to be right: https://github.com/FRRouting/frr/blob/32b20e1ad65e8db2ef80dd39b255f34de2802cd2/zebra/zebra_nhg.c#L2589-L2602
For example, in this case, an interface status is up, and
goto skip_check
is activated, but there are no IP addresses on this interface.Without this check, everything looks better:
I think the check should be removed or modified to take into account more info than only interface status.
Screenshots
Versions
32b20e1ad65e8db2ef80dd39b255f34de2802cd2
)Additional context
It could be that described problem may be a reason also for:
https://github.com/FRRouting/frr/issues/11592 https://github.com/FRRouting/frr/issues/12197