FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.19k stars 1.23k forks source link

NHG : proto NHG modification or Zebra NHG modification issues with protocol NHG routes #15169

Closed pguibert6WIND closed 1 month ago

pguibert6WIND commented 7 months ago

Describe the bug

This is a three-step scenario, where multiple problems are identified. step1 : I create a nexthop-group interface with its route in sharpd, The expectation is that the created protocol NHG is created with the route. step 2 : if I add a route-map in zebra to change the src attribute, the route is not refreshed. The expectation is that a new NHG is duped from the protocol NHG with a nexthop that contains the src attribute This step fails today. step 3 : If I modify the original nexthop group, if a new NHG had been allocated in the previous step, then the modification fails to update the changes from the protocol level.

To Reproduce step 1:

interface loop1
ip address 192.168.0.1/24
exit
interface loop2
ip address 192.168.2.1/26
exit
ip route 1.1.1.1/32 loop1
debu zebra events
debu zebra kernel msgdump send
log stdout debugging 
nexthop-group test
nexthop 1.1.1.1 loop1 onlink
nexthop 1.1.1.2 loop2 onlink
exit
exit
sharp install routes 5.5.5.5 nexthop-group test 1

ubuntu2204# show ip route 5.5.5.5 nexthop-group 
Routing entry for 5.5.5.5/32
  Known via "sharp", distance 150, metric 0, best
  Last update 00:00:21 ago
  Nexthop Group ID: 181818168
  * 1.1.1.1, via loop1 onlink, weight 1
  * 1.1.1.2, via loop2 onlink, weight 1
linux# ip ro show
5.5.5.5 nhid 181818168 proto 194 metric 20 
    nexthop via 1.1.1.1 dev loop1 weight 1 onlink 
    nexthop via 1.1.1.2 dev loop2 weight 1 onlink 

step 2:

route-map NH-SRC permit 111
set src 192.168.0.1
exit
ip protocol sharp route-map NH-SRC

ubuntu2204# show ip route 5.5.5.5 nexthop-group 
Routing entry for 5.5.5.5/32
  Known via "sharp", distance 150, metric 0, best
  Last update 00:01:44 ago
  Nexthop Group ID: 181818168
  * 1.1.1.1, via loop1 onlink, weight 1
  * 1.1.1.2, via loop2 onlink, weight 1

linux# ip ro show
5.5.5.5 nhid 181818168 proto 194 metric 20 
    nexthop via 1.1.1.1 dev loop1 weight 1 onlink 
    nexthop via 1.1.1.2 dev loop2 weight 1 onlink 

step 3:

nexthop-group test
no nexthop 1.1.1.1 loop1 onlink

Expected behavior A new NHG id is allocated in step 2

route-map NH-SRC permit 111
set src 192.168.0.1
exit
ip protocol sharp route-map NH-SRC

ubuntu2204# show ip route 5.5.5.5 nexthop-group 
Routing entry for 5.5.5.5/32
  Known via "sharp", distance 150, metric 0, best
  Last update 00:01:44 ago
  Nexthop Group ID: 532
  * 1.1.1.1, via loop1 onlink, weight 1
  * 1.1.1.2, via loop2 onlink, weight 1

linux# ip ro show
5.5.5.5 nhid 532 proto 194 src 192.168.0.1 metric 20 
    nexthop via 1.1.1.1 dev loop1 weight 1 onlink 
    nexthop via 1.1.1.2 dev loop2 weight 1 onlink 

The change is propagated in the new NHG in step 2. A new NHG is allocated for step 3:

ubuntu2204# show ip route 5.5.5.5 nexthop-group 
Routing entry for 5.5.5.5/32
  Known via "sharp", distance 150, metric 0, best
  Last update 00:01:44 ago
  Nexthop Group ID: 533
  * 1.1.1.2, via loop2 onlink, weight 1

linux# ip ro show
5.5.5.5 nhid 533 proto 194 src 192.168.0.1 metric 20 
    nexthop via 1.1.1.2 dev loop2 weight 1 onlink 

Screenshots

Versions recent version ubuntu-22-04. Additional context

pguibert6WIND commented 7 months ago

The issue is a limitation related protocol NHGs. Meaning that when bgp PIC will be made available, it is highly likely that the 'ip protocol bgp route-map' will have some severe flaws in relation with bgp nexthops updates.

An attempt has been tried to gather all NHGs that depend from the same NHG protocol in that branch:

https://github.com/pguibert6WIND/frr/tree/nhg_dependency_issue_15169

Unfortunately, there are some side effect with all_protocol_startup... So set it to on hold.

pguibert6WIND commented 7 months ago

with pull request, the route-map test fails:

when applying follow patch over https://github.com/FRRouting/frr/pull/14973

@@ -1453,10 +1453,16 @@ def test_nexthop_groups_with_route_maps():
     net["r1"].cmd(
         'vtysh -c "c t" -c "route-map NH-SRC permit 111" -c "set src %s"' % src_str
     )
-    net["r1"].cmd('vtysh -c "c t" -c "ip protocol sharp route-map NH-SRC"')
-
     net["r1"].cmd('vtysh -c "sharp install routes %s nexthop-group test 1"' % route_str)

+    nhg_id = route_get_nhg_id("%s/32" % route_str)
+    print(f"XXXXXXXXXX NH ID is {nhg_id} before route-map")
+    verify_route_nexthop_group("%s/32" % route_str)
+
+    net["r1"].cmd('vtysh -c "c t" -c "ip protocol sharp route-map NH-SRC"')
+
+    nhg_id = route_get_nhg_id("%s/32" % route_str)
+    print(f"XXXXXXXXXX NH ID is {nhg_id} after route-map")
     verify_route_nexthop_group("%s/32" % route_str)

     # Only a valid test on linux using nexthop objects
-- 
2.34.1

I obtain the same NHID before and after applying the route-map in recursive mode, whereas the NHID is different when not in recursive mode. non recursive mode:

XXXXXXXXXX NH ID is 103 before route-map
XXXXXXXXXX NH ID is 280 after route-map
OK

recursive mode:

XXXXXXXXXX NH ID is 181818177 before route-map
XXXXXXXXXX NH ID is 181818177 after route-map
>           assert match is not None, "Route %s/32 not installed with src %s" % (
                route_str,
                src_str,
            )
E           AssertionError: Route 2.2.2.1/32 not installed with src 192.168.0.1
E           assert None is not None

=> without recursive mode, how come a protocol NHG can refresh the two IDs 103 and 280 => with recursive mode, how come can we allocate a new NHID which is dependent from the original one.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

frrbot[bot] commented 1 month ago

This issue will be automatically closed in the specified period unless there is further activity.