Open KanjiMonster opened 1 year ago
What Ubuntu/FRR version did you try with?
It’d be great if you could test newer FRR version. Merge/pull request https://github.com/dentproject/dentOS/pull/216 updates to Debian 10.
Ubuntu 20.04 (with kernel 5.11 due to issues with the ICE driver in newer kernels) with FRR 8.5 (same as on the switch).
Installed 5.15 via hwe on ubuntu 20.04, still works there.
I have good and bad news:
The good news is that with debian 10, the test succeeds (no route installation failures).
The bad news is that the FRR version in debian 10 is a very ancient one, 6.0.2. much older than the FRR version in the debian 9 image (8.5).
In an unsurprising turn of events, using the debian 10 based image with latest FRR (8.5.2) triggers the error again:
Jul 20 11:18:29 localhost systemd[1]: Starting FRRouting...
Jul 20 11:18:29 localhost frrinit.sh[1118]: Starting watchfrr with command: ' /usr/lib/frr/watchfrr -d -F traditional zebra bgpd staticd'.
Jul 20 11:18:29 localhost watchfrr[1154]: [T83RR-8SM5G] watchfrr 8.5.2 starting: vty@0
Jul 20 11:18:29 localhost watchfrr[1154]: [ZCJ3S-SPH5S] zebra state -> down : initial connection attempt failed
Jul 20 11:18:29 localhost watchfrr[1154]: [ZCJ3S-SPH5S] bgpd state -> down : initial connection attempt failed
Jul 20 11:18:29 localhost watchfrr[1154]: [ZCJ3S-SPH5S] staticd state -> down : initial connection attempt failed
Jul 20 11:18:29 localhost watchfrr[1154]: [YFT0P-5Q5YX] Forked background command [pid 1155]: /usr/lib/frr/watchfrr.sh restart all
Jul 20 11:18:30 localhost zebra[1168]: [V98V0-MTWPF] client 17 says hello and bids fair to announce only bgp routes vrf=0
Jul 20 11:18:30 localhost zebra[1168]: [V98V0-MTWPF] client 31 says hello and bids fair to announce only vnc routes vrf=0
Jul 20 11:18:30 localhost zebra[1168]: [V98V0-MTWPF] client 38 says hello and bids fair to announce only static routes vrf=0
Jul 20 11:18:30 localhost watchfrr[1154]: [QDG3Y-BY5TN] zebra state -> up : connect succeeded
Jul 20 11:18:30 localhost watchfrr[1154]: [QDG3Y-BY5TN] bgpd state -> up : connect succeeded
Jul 20 11:18:30 localhost watchfrr[1154]: [QDG3Y-BY5TN] staticd state -> up : connect succeeded
Jul 20 11:18:30 localhost watchfrr[1154]: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
Jul 20 11:18:30 localhost frrinit.sh[1118]: Started watchfrr.
Jul 20 11:18:30 localhost systemd[1]: Started FRRouting.
Jul 20 11:18:32 localhost bgpd[1174]: [M59KS-A3ZXZ] bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 10.0.0.2 in vrf default
Jul 20 11:18:34 localhost zebra[1168]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=18, pid=2573606066
Jul 20 11:18:34 localhost zebra[1168]: [VYKYC-709DP] default(0:254):10.0.2.0/24: Route install failed
Jul 20 11:18:35 localhost bgpd[1174]: [M59KS-A3ZXZ] bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 10.0.1.2 in vrf default
Jul 20 11:18:35 localhost zebra[1168]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=21, pid=2573606066
Jul 20 11:18:35 localhost zebra[1168]: [VYKYC-709DP] default(0:254):10.0.100.2/32: Route install failed
Jul 20 11:18:36 localhost zebra[1168]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=22, pid=2573606066
Jul 20 11:18:36 localhost zebra[1168]: [VYKYC-709DP] default(0:254):10.0.101.2/32: Route install failed
Jul 20 11:18:44 localhost zebra[1168]: [WPPMZ-G9797] if_zebra_speed_update: swp5 old speed: 4294967295 new speed: 1000
Jul 20 11:18:44 localhost zebra[1168]: [WPPMZ-G9797] if_zebra_speed_update: swp52 old speed: 4294967295 new speed: 10000
I think I found the place where the route is getting rejected: https://github.com/dentproject/linux/blob/dent-linux-5.15.y/drivers/net/ethernet/marvell/prestera/prestera_router.c#L2699
static int
__prestera_router_fib4_event(struct prestera_switch *sw, unsigned long event,
struct fib_entry_notifier_info *fen_info)
{
struct prestera_fib_event_work *fib_work;
if (!fen_info->fi)
return NOTIFY_DONE;
/* Sanity */
if (event == FIB_EVENT_ENTRY_REPLACE) {
if (fen_info->fi->nh) <- this is true for the routes installed by FRR
return notifier_from_errno(-EINVAL);
....
There is no reason it couldn't handle it, it just refuses IPv4 routes with a nexthop via id instead of a direct gw being set.
Confusingly there is no check for IPv6 routes, which is probably why they succeed.
I also see a code for handling nexthops, and checks for the amount of nexthops, so it might not be that much work fixing this, or rather implement the handling of this type of route info.
you can workaround this issue by setting
no zebra nexthop kernel enable
you can workaround this issue by setting
no zebra nexthop kernel enable
Eh .... I decided to check if I can fix it, and the following change works for me:
diff --git a/drivers/net/ethernet/marvell/prestera/prestera_router.c b/drivers/net/ethernet/marvell/prestera/prestera_router.c
index 6b293b039da6..50670321f0e6 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_router.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_router.c
@@ -250,7 +250,7 @@ prestera_kern_fib_info_nhc(struct fib_notifier_info *info, int n)
if (info->family == AF_INET) {
fen4_info = container_of(info, struct fib_entry_notifier_info,
info);
- return &fib_info_nh(fen4_info->fi, n)->nh_common;
+ return fib_info_nhc(fen4_info->fi, n);
} else if (info->family == AF_INET6) {
fen6_info = container_of(info, struct fib6_entry_notifier_info,
info);
@@ -526,7 +526,7 @@ int prestera_util_kern_dip2nh_grp_key(struct prestera_switch *sw,
struct prestera_nexthop_group_key *res)
{
struct fib_result fib_res;
- struct fib_nh *fib_nh;
+ struct fib_nh_common *fib_nhc;
int err;
/* TODO: support IPv6 */
@@ -538,10 +538,10 @@ int prestera_util_kern_dip2nh_grp_key(struct prestera_switch *sw,
return 0;
if (prestera_fi_is_direct(fib_res.fi)) {
- fib_nh = fib_info_nh(fib_res.fi, 0);
+ fib_nhc = fib_info_nhc(fib_res.fi, 0);
memset(res, 0, sizeof(*res));
res->neigh[0].addr = *addr;
- res->neigh[0].rif = fib_nh->fib_nh_dev;
+ res->neigh[0].rif = fib_nhc->nhc_dev;
return 1;
}
@@ -562,7 +562,7 @@ static bool
__prestera_util_kern_n_is_reachable_v4(u32 tb_id, __be32 *addr,
struct net_device *dev)
{
- struct fib_nh *fib_nh;
+ struct fib_nh_common *fib_nhc;
struct fib_result res;
bool reachable;
@@ -570,8 +570,8 @@ __prestera_util_kern_n_is_reachable_v4(u32 tb_id, __be32 *addr,
if (!prestera_util_kern_get_route(&res, tb_id, addr))
if (prestera_fi_is_direct(res.fi)) {
- fib_nh = fib_info_nh(res.fi, 0);
- if (dev == fib_nh->fib_nh_dev)
+ fib_nhc = fib_info_nhc(res.fi, 0);
+ if (dev == fib_nhc->nhc_dev)
reachable = true;
}
@@ -2472,11 +2472,11 @@ static int prestera_inet6addr_valid_event(struct notifier_block *nb,
static bool __prestera_fi_is_direct(struct fib_info *fi)
{
- struct fib_nh *fib_nh;
+ struct fib_nh_common *fib_nhc;
if (fib_info_num_path(fi) == 1) {
- fib_nh = fib_info_nh(fi, 0);
- if (fib_nh->fib_nh_gw_family == AF_UNSPEC)
+ fib_nhc = fib_info_nhc(fi, 0);
+ if (fib_nhc->nhc_gw_family == AF_UNSPEC)
return true;
}
@@ -2697,9 +2697,6 @@ __prestera_router_fib4_event(struct prestera_switch *sw, unsigned long event,
/* Sanity */
if (event == FIB_EVENT_ENTRY_REPLACE) {
- if (fen_info->fi->nh)
- return notifier_from_errno(-EINVAL);
-
if (fen_info->fi->fib_nh_is_v6)
return notifier_from_errno(-EINVAL);
}
Now pings go through and routes get offloaded:
root@localhost:~# ip r
default via 172.16.111.1 dev ma1 proto dhcp src 172.16.111.132 metric 1024 trap rt_offload
10.0.0.0/24 dev swp52 proto kernel scope link src 10.0.0.1 rt_trap
10.0.1.0/24 dev swp5 proto kernel scope link src 10.0.1.1 rt_trap
10.0.2.0/24 nhid 18 via 10.0.0.2 dev swp52 proto bgp metric 20 offload rt_offload
10.0.100.2 nhid 22 via 10.0.1.2 dev swp5 proto bgp metric 20 offload rt_offload
10.0.101.2 nhid 18 via 10.0.0.2 dev swp52 proto bgp metric 20 offload rt_offload
172.16.111.0/24 dev ma1 proto kernel scope link src 172.16.111.132 rt_trap
172.16.111.1 dev ma1 proto dhcp scope link src 172.16.111.132 metric 1024 rt_offload
listening on swp5 (or swp52) only shows the bgp traffic, and none of the routed traffic.
of course un-tested outside of FRR / bgp with IPv4.
I can make this a PR for the kernel, at least I think the changes I did are correct (though no idea if they are complete).
Found another place where fib_nh
can be replaced with fib_nh_common
(even simplifying the code), and while test compiling the changes locally I was greeted with this warning:
drivers/net/ethernet/marvell/prestera/prestera_router.c: In function ‘prestera_kern_fib_info_nhs’:
drivers/net/ethernet/marvell/prestera/prestera_router.c:288:47: warning: the comparison will always evaluate as ‘true’ for the address of ‘fib6_nh’ will never be NULL [-Waddress]
288 | return fen6_info->rt->fib6_nh ?
| ^
In file included from ./include/net/nexthop.h:17,
from drivers/net/ethernet/marvell/prestera/prestera_router.c:17:
./include/net/ip6_fib.h:203:41: note: ‘fib6_nh’ declared here
203 | struct fib6_nh fib6_nh[];
which is a valid bug. AFAIU, this means that all IPv6 routes have been treated as mpath routes, which might have hidden the issue for IPv6 routes.
Looking at the code:
} else if (info->family == AF_INET6) {
fen6_info = container_of(info, struct fib6_entry_notifier_info,
info);
return fen6_info->rt->fib6_nh ?
(fen6_info->rt->fib6_nsiblings + 1) : 0;
}
It will always return at least 2 for all IPv6 routes. Even if the check would be working, returning either >=2 or 0 also sounds wrong, surely there must be routes with just one nexthop? Though I'm unsure what the correct replacement would be. Maybe something like:
if (fen6_info->rt->fib6_nsiblings > 0)
return fen6_info->rt->fib6_nsiblings + 1;
if (fen6_info->rt->nh)
return nexthop_num_path(fen6_info->rt->nh);
return 0;
Following the setup at https://docs.bisdn.de/network_configuration/bgp.html#bgp-for-ipv4-networks (with the addition of
bgp ebgp-requires-policy
so the peers talk to each other) route information gets exchanged and distributed, but FRR fails to install any route on the switch:The consequence is that while routes are distributed, they don't actually work, as the learned routes are not installed in the routing table (and also not offloaded).
The same thing happens with other protocols (RIP, OSPF, ...), but does not happen for BGP with IPv6.
Turning on debug for zebra, it shows the following:
so the issue seems to be installing routes pointing to a previously created nexthop.
This is using DentOS main, on TN48M-DN.
minimal reproducer via ip commands:
The above fails on DentOS, but e.g. succeeds in ubuntu.