dentproject / dentOS

dentOS SwitchDev based NOS
Other
203 stars 59 forks source link

Installing IPv4 routes setting nexthop via NHID fails and breaks routing with FRR (e.g. BGP) #232

Open KanjiMonster opened 1 year ago

KanjiMonster commented 1 year ago

Following the setup at https://docs.bisdn.de/network_configuration/bgp.html#bgp-for-ipv4-networks (with the addition of bgp ebgp-requires-policy so the peers talk to each other) route information gets exchanged and distributed, but FRR fails to install any route on the switch:

Jul 19 11:19:20 localhost systemd[1]: Started FRRouting.
Jul 19 11:19:22 localhost bgpd[998]: [M59KS-A3ZXZ] bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 10.0.0.2 in vrf default
Jul 19 11:19:24 localhost zebra[986]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=16, pid=2180819141
Jul 19 11:19:24 localhost zebra[986]: [VYKYC-709DP] default(0:254):10.0.2.0/24: Route install failed
Jul 19 11:19:25 localhost zebra[986]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=19, pid=2180819141
Jul 19 11:19:25 localhost zebra[986]: [VYKYC-709DP] default(0:254):10.0.101.2/32: Route install failed
Jul 19 11:19:26 localhost bgpd[998]: [M59KS-A3ZXZ] bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 10.0.1.2 in vrf default
Jul 19 11:19:26 localhost zebra[986]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=22, pid=2180819141
Jul 19 11:19:26 localhost zebra[986]: [VYKYC-709DP] default(0:254):10.0.100.2/32: Route install failed
Jul 19 11:19:34 localhost zebra[986]: [WPPMZ-G9797] if_zebra_speed_update: swp5 old speed: 4294967295 new speed: 1000
Jul 19 11:19:35 localhost zebra[986]: [WPPMZ-G9797] if_zebra_speed_update: swp52 old speed: 4294967295 new speed: 10000

The consequence is that while routes are distributed, they don't actually work, as the learned routes are not installed in the routing table (and also not offloaded).

The same thing happens with other protocols (RIP, OSPF, ...), but does not happen for BGP with IPv6.

Turning on debug for zebra, it shows the following:

Jul 19 11:35:10 localhost zebra[959]: [YXCJP-0WZWV] netlink_nexthop_msg_encode: ID (17): directly connected, swp5(8) vrf default(0)
Jul 19 11:35:10 localhost zebra[959]: [R43C6-KYHWT] netlink_nexthop_msg_encode: RTM_NEWNEXTHOP, id=17
Jul 19 11:35:10 localhost zebra[959]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=40, msg cnt=1
Jul 19 11:35:10 localhost zebra[959]: [TJ327-ET8HE] netlink_send_msg: >> netlink message dump [sent]
Jul 19 11:35:10 localhost zebra[959]: [JAS4D-NCWGP] nlmsghdr [len=40 type=(104) NEWNEXTHOP flags=(0x0501) {REQUEST,DUMP,(ROOT|REPLACE|CAPPED),(ATOMIC|CREATE)} seq=19 pid=2182728561]
Jul 19 11:35:10 localhost zebra[959]: [WCX94-SW894]   nhm [family=(2) AF_INET scope=(0) UNIVERSE protocol=(11) ZEBRA flags=0x00000000 {}]
Jul 19 11:35:10 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(1) ID]
Jul 19 11:35:10 localhost zebra[959]: [Z4E9C-GD9EP]       17
Jul 19 11:35:10 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(5) OIF]
Jul 19 11:35:10 localhost zebra[959]: [JR4EA-BKPTA]       8
Jul 19 11:35:10 localhost zebra[959]: [YXCJP-0WZWV] netlink_nexthop_msg_encode: ID (19): 10.0.0.2, via swp52(55) vrf default(0)
Jul 19 11:35:10 localhost zebra[959]: [R43C6-KYHWT] netlink_nexthop_msg_encode: RTM_NEWNEXTHOP, id=19
Jul 19 11:35:10 localhost zebra[959]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=48, msg cnt=1
Jul 19 11:35:10 localhost zebra[959]: [TJ327-ET8HE] netlink_send_msg: >> netlink message dump [sent]
Jul 19 11:35:10 localhost zebra[959]: [JAS4D-NCWGP] nlmsghdr [len=48 type=(104) NEWNEXTHOP flags=(0x0501) {REQUEST,DUMP,(ROOT|REPLACE|CAPPED),(ATOMIC|CREATE)} seq=21 pid=2182728561]
Jul 19 11:35:10 localhost zebra[959]: [WCX94-SW894]   nhm [family=(2) AF_INET scope=(0) UNIVERSE protocol=(11) ZEBRA flags=0x00000000 {}]
Jul 19 11:35:10 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(1) ID]
Jul 19 11:35:10 localhost zebra[959]: [Z4E9C-GD9EP]       19
Jul 19 11:35:10 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(6) GATEWAY]
Jul 19 11:35:10 localhost zebra[959]: [M8QV4-KY9C0]       10.0.0.2
Jul 19 11:35:10 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(5) OIF]
Jul 19 11:35:10 localhost zebra[959]: [JR4EA-BKPTA]       55
Jul 19 11:35:10 localhost zebra[959]: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_NEWROUTE 10.0.2.0/24 vrf 0(254)
Jul 19 11:35:10 localhost zebra[959]: [J87BH-XW5PP] netlink_route_multipath_msg_encode: 10.0.2.0/24 nhg_id is 19
Jul 19 11:35:10 localhost zebra[959]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=52, msg cnt=1
Jul 19 11:35:10 localhost zebra[959]: [TJ327-ET8HE] netlink_send_msg: >> netlink message dump [sent]
Jul 19 11:35:10 localhost zebra[959]: [JAS4D-NCWGP] nlmsghdr [len=52 type=(24) NEWROUTE flags=(0x0501) {REQUEST,DUMP,(ROOT|REPLACE|CAPPED),(ATOMIC|CREATE)} seq=22 pid=2182728561]
Jul 19 11:35:10 localhost zebra[959]: [GCEGC-W8YBF]   rtmsg [family=(2) AF_INET dstlen=24 srclen=0 tos=0 table=254 protocol=(186) BGP scope=(0) UNIVERSE type=(1) UNICAST flags=0x0000 {}]
Jul 19 11:35:10 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(1) DST]
Jul 19 11:35:10 localhost zebra[959]: [M8QV4-KY9C0]       10.0.2.0
Jul 19 11:35:10 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(6) PRIORITY]
Jul 19 11:35:10 localhost zebra[959]: [Z4E9C-GD9EP]       20
Jul 19 11:35:10 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(30) NH_ID]
Jul 19 11:35:10 localhost zebra[959]: [Z4E9C-GD9EP]       19
Jul 19 11:35:10 localhost zebra[959]: [V8KNF-8EXH8] netlink_recv_msg: << netlink message dump [recv]
Jul 19 11:35:10 localhost zebra[959]: [JAS4D-NCWGP] nlmsghdr [len=36 type=(2) ERROR flags=(0x0100) {DUMP,(ROOT|REPLACE|CAPPED)} seq=22 pid=2182728561]
Jul 19 11:35:10 localhost zebra[959]: [KWP1C-6CSXF]   nlmsgerr [error=(-22) Invalid argument]
Jul 19 11:35:10 localhost zebra[959]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=22, pid=2182728561
Jul 19 11:35:10 localhost zebra[959]: [QTT8V-3ZQ34] nl_batch_read_resp: netlink error message seq=22
Jul 19 11:35:10 localhost zebra[959]: [VYKYC-709DP] default(0:254):10.0.2.0/24: Route install failed
...
Jul 19 11:35:12 localhost zebra[959]: [YXCJP-0WZWV] netlink_nexthop_msg_encode: ID (24): 10.0.1.2, via swp5(8) vrf default(0)
Jul 19 11:35:12 localhost zebra[959]: [R43C6-KYHWT] netlink_nexthop_msg_encode: RTM_NEWNEXTHOP, id=24
Jul 19 11:35:12 localhost zebra[959]: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_NEWROUTE 10.0.100.2/32 vrf 0(254)
Jul 19 11:35:12 localhost zebra[959]: [J87BH-XW5PP] netlink_route_multipath_msg_encode: 10.0.100.2/32 nhg_id is 24
Jul 19 11:35:12 localhost zebra[959]: [YXPF5-B2CE0] netlink_route_multipath_msg_encode: RTM_NEWROUTE 10.0.101.2/32 vrf 0(254)
Jul 19 11:35:12 localhost zebra[959]: [J87BH-XW5PP] netlink_route_multipath_msg_encode: 10.0.101.2/32 nhg_id is 19
Jul 19 11:35:12 localhost zebra[959]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=152, msg cnt=3
Jul 19 11:35:12 localhost zebra[959]: [TJ327-ET8HE] netlink_send_msg: >> netlink message dump [sent]
Jul 19 11:35:12 localhost zebra[959]: [JAS4D-NCWGP] nlmsghdr [len=48 type=(104) NEWNEXTHOP flags=(0x0501) {REQUEST,DUMP,(ROOT|REPLACE|CAPPED),(ATOMIC|CREATE)} seq=24 pid=2182728561]
Jul 19 11:35:12 localhost zebra[959]: [WCX94-SW894]   nhm [family=(2) AF_INET scope=(0) UNIVERSE protocol=(11) ZEBRA flags=0x00000000 {}]
Jul 19 11:35:12 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(1) ID]
Jul 19 11:35:12 localhost zebra[959]: [Z4E9C-GD9EP]       24
Jul 19 11:35:12 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(6) GATEWAY]
Jul 19 11:35:12 localhost zebra[959]: [M8QV4-KY9C0]       10.0.1.2
Jul 19 11:35:12 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(5) OIF]
Jul 19 11:35:12 localhost zebra[959]: [JR4EA-BKPTA]       8
Jul 19 11:35:12 localhost zebra[959]: [JAS4D-NCWGP] nlmsghdr [len=52 type=(24) NEWROUTE flags=(0x0501) {REQUEST,DUMP,(ROOT|REPLACE|CAPPED),(ATOMIC|CREATE)} seq=25 pid=2182728561]
Jul 19 11:35:12 localhost zebra[959]: [GCEGC-W8YBF]   rtmsg [family=(2) AF_INET dstlen=32 srclen=0 tos=0 table=254 protocol=(186) BGP scope=(0) UNIVERSE type=(1) UNICAST flags=0x0000 {}]
Jul 19 11:35:12 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(1) DST]
Jul 19 11:35:12 localhost zebra[959]: [M8QV4-KY9C0]       10.0.100.2
Jul 19 11:35:12 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(6) PRIORITY]
Jul 19 11:35:12 localhost zebra[959]: [Z4E9C-GD9EP]       20
Jul 19 11:35:12 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(30) NH_ID]
Jul 19 11:35:12 localhost zebra[959]: [Z4E9C-GD9EP]       24
Jul 19 11:35:12 localhost zebra[959]: [JAS4D-NCWGP] nlmsghdr [len=52 type=(24) NEWROUTE flags=(0x0501) {REQUEST,DUMP,(ROOT|REPLACE|CAPPED),(ATOMIC|CREATE)} seq=26 pid=2182728561]
Jul 19 11:35:12 localhost zebra[959]: [GCEGC-W8YBF]   rtmsg [family=(2) AF_INET dstlen=32 srclen=0 tos=0 table=254 protocol=(186) BGP scope=(0) UNIVERSE type=(1) UNICAST flags=0x0000 {}]
Jul 19 11:35:12 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(1) DST]
Jul 19 11:35:12 localhost zebra[959]: [M8QV4-KY9C0]       10.0.101.2
Jul 19 11:35:12 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(6) PRIORITY]
Jul 19 11:35:12 localhost zebra[959]: [Z4E9C-GD9EP]       20
Jul 19 11:35:12 localhost zebra[959]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(30) NH_ID]
Jul 19 11:35:12 localhost zebra[959]: [Z4E9C-GD9EP]       19
Jul 19 11:35:12 localhost zebra[959]: [V8KNF-8EXH8] netlink_recv_msg: << netlink message dump [recv]
Jul 19 11:35:12 localhost zebra[959]: [JAS4D-NCWGP] nlmsghdr [len=36 type=(2) ERROR flags=(0x0100) {DUMP,(ROOT|REPLACE|CAPPED)} seq=25 pid=2182728561]
Jul 19 11:35:12 localhost zebra[959]: [KWP1C-6CSXF]   nlmsgerr [error=(-22) Invalid argument]
Jul 19 11:35:12 localhost zebra[959]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=25, pid=2182728561
Jul 19 11:35:12 localhost zebra[959]: [QTT8V-3ZQ34] nl_batch_read_resp: netlink error message seq=25
Jul 19 11:35:12 localhost zebra[959]: [V8KNF-8EXH8] netlink_recv_msg: << netlink message dump [recv]
Jul 19 11:35:12 localhost zebra[959]: [JAS4D-NCWGP] nlmsghdr [len=36 type=(2) ERROR flags=(0x0100) {DUMP,(ROOT|REPLACE|CAPPED)} seq=26 pid=2182728561]
Jul 19 11:35:12 localhost zebra[959]: [KWP1C-6CSXF]   nlmsgerr [error=(-22) Invalid argument]
Jul 19 11:35:12 localhost zebra[959]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=26, pid=2182728561
Jul 19 11:35:12 localhost zebra[959]: [QTT8V-3ZQ34] nl_batch_read_resp: netlink error message seq=26
Jul 19 11:35:12 localhost zebra[959]: [VYKYC-709DP] default(0:254):10.0.100.2/32: Route install failed
Jul 19 11:35:12 localhost zebra[959]: [VYKYC-709DP] default(0:254):10.0.101.2/32: Route install failed

so the issue seems to be installing routes pointing to a previously created nexthop.

This is using DentOS main, on TN48M-DN.

minimal reproducer via ip commands:

ip link set up swp5
ip a a 10.0.0.1/24 dev swp5
ip nexthop add dev eno5 id 20 via 10.0.0.2
ip route add 10.0.1.0/24 nhid 20
RTNETLINK answers: Invalid argument

The above fails on DentOS, but e.g. succeeds in ubuntu.

paulmenzel commented 1 year ago

What Ubuntu/FRR version did you try with?

It’d be great if you could test newer FRR version. Merge/pull request https://github.com/dentproject/dentOS/pull/216 updates to Debian 10.

KanjiMonster commented 1 year ago

Ubuntu 20.04 (with kernel 5.11 due to issues with the ICE driver in newer kernels) with FRR 8.5 (same as on the switch).

KanjiMonster commented 1 year ago

Installed 5.15 via hwe on ubuntu 20.04, still works there.

KanjiMonster commented 1 year ago

I have good and bad news:

The good news is that with debian 10, the test succeeds (no route installation failures).

The bad news is that the FRR version in debian 10 is a very ancient one, 6.0.2. much older than the FRR version in the debian 9 image (8.5).

KanjiMonster commented 1 year ago

In an unsurprising turn of events, using the debian 10 based image with latest FRR (8.5.2) triggers the error again:

Jul 20 11:18:29 localhost systemd[1]: Starting FRRouting...
Jul 20 11:18:29 localhost frrinit.sh[1118]: Starting watchfrr with command: ' /usr/lib/frr/watchfrr -d -F traditional zebra bgpd staticd'.
Jul 20 11:18:29 localhost watchfrr[1154]: [T83RR-8SM5G] watchfrr 8.5.2 starting: vty@0
Jul 20 11:18:29 localhost watchfrr[1154]: [ZCJ3S-SPH5S] zebra state -> down : initial connection attempt failed
Jul 20 11:18:29 localhost watchfrr[1154]: [ZCJ3S-SPH5S] bgpd state -> down : initial connection attempt failed
Jul 20 11:18:29 localhost watchfrr[1154]: [ZCJ3S-SPH5S] staticd state -> down : initial connection attempt failed
Jul 20 11:18:29 localhost watchfrr[1154]: [YFT0P-5Q5YX] Forked background command [pid 1155]: /usr/lib/frr/watchfrr.sh restart all
Jul 20 11:18:30 localhost zebra[1168]: [V98V0-MTWPF] client 17 says hello and bids fair to announce only bgp routes vrf=0
Jul 20 11:18:30 localhost zebra[1168]: [V98V0-MTWPF] client 31 says hello and bids fair to announce only vnc routes vrf=0
Jul 20 11:18:30 localhost zebra[1168]: [V98V0-MTWPF] client 38 says hello and bids fair to announce only static routes vrf=0
Jul 20 11:18:30 localhost watchfrr[1154]: [QDG3Y-BY5TN] zebra state -> up : connect succeeded
Jul 20 11:18:30 localhost watchfrr[1154]: [QDG3Y-BY5TN] bgpd state -> up : connect succeeded
Jul 20 11:18:30 localhost watchfrr[1154]: [QDG3Y-BY5TN] staticd state -> up : connect succeeded
Jul 20 11:18:30 localhost watchfrr[1154]: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
Jul 20 11:18:30 localhost frrinit.sh[1118]: Started watchfrr.
Jul 20 11:18:30 localhost systemd[1]: Started FRRouting.
Jul 20 11:18:32 localhost bgpd[1174]: [M59KS-A3ZXZ] bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 10.0.0.2 in vrf default
Jul 20 11:18:34 localhost zebra[1168]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=18, pid=2573606066
Jul 20 11:18:34 localhost zebra[1168]: [VYKYC-709DP] default(0:254):10.0.2.0/24: Route install failed
Jul 20 11:18:35 localhost bgpd[1174]: [M59KS-A3ZXZ] bgp_update_receive: rcvd End-of-RIB for IPv4 Unicast from 10.0.1.2 in vrf default
Jul 20 11:18:35 localhost zebra[1168]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=21, pid=2573606066
Jul 20 11:18:35 localhost zebra[1168]: [VYKYC-709DP] default(0:254):10.0.100.2/32: Route install failed
Jul 20 11:18:36 localhost zebra[1168]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWROUTE(24), seq=22, pid=2573606066
Jul 20 11:18:36 localhost zebra[1168]: [VYKYC-709DP] default(0:254):10.0.101.2/32: Route install failed
Jul 20 11:18:44 localhost zebra[1168]: [WPPMZ-G9797] if_zebra_speed_update: swp5 old speed: 4294967295 new speed: 1000
Jul 20 11:18:44 localhost zebra[1168]: [WPPMZ-G9797] if_zebra_speed_update: swp52 old speed: 4294967295 new speed: 10000
KanjiMonster commented 1 year ago

I think I found the place where the route is getting rejected: https://github.com/dentproject/linux/blob/dent-linux-5.15.y/drivers/net/ethernet/marvell/prestera/prestera_router.c#L2699

static int
__prestera_router_fib4_event(struct prestera_switch *sw, unsigned long event,
                 struct fib_entry_notifier_info *fen_info)
{
    struct prestera_fib_event_work *fib_work;

    if (!fen_info->fi)
        return NOTIFY_DONE;

    /* Sanity */
    if (event == FIB_EVENT_ENTRY_REPLACE) {
        if (fen_info->fi->nh) <- this is true for the routes installed by FRR
            return notifier_from_errno(-EINVAL);
       ....

There is no reason it couldn't handle it, it just refuses IPv4 routes with a nexthop via id instead of a direct gw being set.

Confusingly there is no check for IPv6 routes, which is probably why they succeed.

I also see a code for handling nexthops, and checks for the amount of nexthops, so it might not be that much work fixing this, or rather implement the handling of this type of route info.

taraschornyiplv commented 1 year ago

you can workaround this issue by setting no zebra nexthop kernel enable

KanjiMonster commented 1 year ago

you can workaround this issue by setting no zebra nexthop kernel enable

Eh .... I decided to check if I can fix it, and the following change works for me:

diff --git a/drivers/net/ethernet/marvell/prestera/prestera_router.c b/drivers/net/ethernet/marvell/prestera/prestera_router.c
index 6b293b039da6..50670321f0e6 100644
--- a/drivers/net/ethernet/marvell/prestera/prestera_router.c
+++ b/drivers/net/ethernet/marvell/prestera/prestera_router.c
@@ -250,7 +250,7 @@ prestera_kern_fib_info_nhc(struct fib_notifier_info *info, int n)
    if (info->family == AF_INET) {
        fen4_info = container_of(info, struct fib_entry_notifier_info,
                     info);
-       return &fib_info_nh(fen4_info->fi, n)->nh_common;
+       return fib_info_nhc(fen4_info->fi, n);
    } else if (info->family == AF_INET6) {
        fen6_info = container_of(info, struct fib6_entry_notifier_info,
                     info);
@@ -526,7 +526,7 @@ int prestera_util_kern_dip2nh_grp_key(struct prestera_switch *sw,
                      struct prestera_nexthop_group_key *res)
 {
    struct fib_result fib_res;
-   struct fib_nh *fib_nh;
+   struct fib_nh_common *fib_nhc;
    int err;

    /* TODO: support IPv6 */
@@ -538,10 +538,10 @@ int prestera_util_kern_dip2nh_grp_key(struct prestera_switch *sw,
        return 0;

    if (prestera_fi_is_direct(fib_res.fi)) {
-       fib_nh = fib_info_nh(fib_res.fi, 0);
+       fib_nhc = fib_info_nhc(fib_res.fi, 0);
        memset(res, 0, sizeof(*res));
        res->neigh[0].addr = *addr;
-       res->neigh[0].rif = fib_nh->fib_nh_dev;
+       res->neigh[0].rif = fib_nhc->nhc_dev;
        return 1;
    }

@@ -562,7 +562,7 @@ static bool
 __prestera_util_kern_n_is_reachable_v4(u32 tb_id, __be32 *addr,
                       struct net_device *dev)
 {
-   struct fib_nh *fib_nh;
+   struct fib_nh_common *fib_nhc;
    struct fib_result res;
    bool reachable;

@@ -570,8 +570,8 @@ __prestera_util_kern_n_is_reachable_v4(u32 tb_id, __be32 *addr,

    if (!prestera_util_kern_get_route(&res, tb_id, addr))
        if (prestera_fi_is_direct(res.fi)) {
-           fib_nh = fib_info_nh(res.fi, 0);
-           if (dev == fib_nh->fib_nh_dev)
+           fib_nhc = fib_info_nhc(res.fi, 0);
+           if (dev == fib_nhc->nhc_dev)
                reachable = true;
        }

@@ -2472,11 +2472,11 @@ static int prestera_inet6addr_valid_event(struct notifier_block *nb,

 static bool __prestera_fi_is_direct(struct fib_info *fi)
 {
-   struct fib_nh *fib_nh;
+   struct fib_nh_common *fib_nhc;

    if (fib_info_num_path(fi) == 1) {
-       fib_nh = fib_info_nh(fi, 0);
-       if (fib_nh->fib_nh_gw_family == AF_UNSPEC)
+       fib_nhc = fib_info_nhc(fi, 0);
+       if (fib_nhc->nhc_gw_family == AF_UNSPEC)
            return true;
    }

@@ -2697,9 +2697,6 @@ __prestera_router_fib4_event(struct prestera_switch *sw, unsigned long event,

    /* Sanity */
    if (event == FIB_EVENT_ENTRY_REPLACE) {
-       if (fen_info->fi->nh)
-           return notifier_from_errno(-EINVAL);
-
        if (fen_info->fi->fib_nh_is_v6)
            return notifier_from_errno(-EINVAL);
    }

Now pings go through and routes get offloaded:

root@localhost:~# ip r
default via 172.16.111.1 dev ma1 proto dhcp src 172.16.111.132 metric 1024 trap rt_offload 
10.0.0.0/24 dev swp52 proto kernel scope link src 10.0.0.1 rt_trap 
10.0.1.0/24 dev swp5 proto kernel scope link src 10.0.1.1 rt_trap 
10.0.2.0/24 nhid 18 via 10.0.0.2 dev swp52 proto bgp metric 20 offload rt_offload 
10.0.100.2 nhid 22 via 10.0.1.2 dev swp5 proto bgp metric 20 offload rt_offload 
10.0.101.2 nhid 18 via 10.0.0.2 dev swp52 proto bgp metric 20 offload rt_offload 
172.16.111.0/24 dev ma1 proto kernel scope link src 172.16.111.132 rt_trap 
172.16.111.1 dev ma1 proto dhcp scope link src 172.16.111.132 metric 1024 rt_offload

listening on swp5 (or swp52) only shows the bgp traffic, and none of the routed traffic.

of course un-tested outside of FRR / bgp with IPv4.

I can make this a PR for the kernel, at least I think the changes I did are correct (though no idea if they are complete).

KanjiMonster commented 1 year ago

Found another place where fib_nh can be replaced with fib_nh_common (even simplifying the code), and while test compiling the changes locally I was greeted with this warning:

drivers/net/ethernet/marvell/prestera/prestera_router.c: In function ‘prestera_kern_fib_info_nhs’:
drivers/net/ethernet/marvell/prestera/prestera_router.c:288:47: warning: the comparison will always evaluate as ‘true’ for the address of ‘fib6_nh’ will never be NULL [-Waddress]
  288 |                 return fen6_info->rt->fib6_nh ?
      |                                               ^
In file included from ./include/net/nexthop.h:17,
                 from drivers/net/ethernet/marvell/prestera/prestera_router.c:17:
./include/net/ip6_fib.h:203:41: note: ‘fib6_nh’ declared here
  203 |         struct fib6_nh                  fib6_nh[];

which is a valid bug. AFAIU, this means that all IPv6 routes have been treated as mpath routes, which might have hidden the issue for IPv6 routes.

Looking at the code:

        } else if (info->family == AF_INET6) {
                fen6_info = container_of(info, struct fib6_entry_notifier_info,
                                         info);
                return fen6_info->rt->fib6_nh ?
                        (fen6_info->rt->fib6_nsiblings + 1) : 0;
        }

It will always return at least 2 for all IPv6 routes. Even if the check would be working, returning either >=2 or 0 also sounds wrong, surely there must be routes with just one nexthop? Though I'm unsure what the correct replacement would be. Maybe something like:

if (fen6_info->rt->fib6_nsiblings > 0)
    return fen6_info->rt->fib6_nsiblings + 1;
if (fen6_info->rt->nh)
   return nexthop_num_path(fen6_info->rt->nh);
return 0;