FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.34k stars 1.25k forks source link

Incorrect nexthop when importing a route with a different VNI (L2VNI or L3VNI) #14259

Closed vincentbernat closed 1 year ago

vincentbernat commented 1 year ago

Describe the bug

When importing several RT (different L3VNI for example) inside a VRF, the corresponding routes are installed with the nexthop of the L3VNI associated to the VRF instead of the nexthop of the VNI associated with the route.

This does not happen exactly on master because since b991a37262539cda53b6828f1ce993b74f1f9817, the route is considered inactive, but still with the wrong nexthop. This is fully reproducible with 8.5.2.

To Reproduce

Two hosts are directly connected to each other and establish a BGP session with L2VPN EVPN family only.

Host A:

vrf VRFA
 vni 80001
exit-vrf
!
vrf VRFB
 vni 80002
exit-vrf
!
router bgp 64600
 bgp router-id 1.1.1.1
 no bgp default ipv4-unicast
 bgp disable-ebgp-connected-route-check
 neighbor BGP peer-group
 neighbor BGP remote-as 64600
 neighbor 203.0.113.2 peer-group BGP
 !
 address-family ipv4 unicast
  neighbor BGP activate
  neighbor BGP soft-reconfiguration inbound
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor BGP activate
  neighbor BGP soft-reconfiguration inbound
  advertise-all-vni
 exit-address-family
exit
!
router bgp 64600 vrf VRFA
 bgp router-id 1.1.1.1
 !
 address-family ipv4 unicast
  redistribute connected
 exit-address-family
 !
 address-family l2vpn evpn
  advertise ipv4 unicast
  route-target import 64600:80001
  route-target import 64600:80002
  route-target export 64600:80001
 exit-address-family
exit
!
router bgp 64600 vrf VRFB
 bgp router-id 1.1.1.1
 !
 address-family ipv4 unicast
  redistribute connected
 exit-address-family
 !
 address-family l2vpn evpn
  advertise ipv4 unicast
  route-target import 64600:80001
  route-target import 64600:80002
  route-target export 64600:80002
 exit-address-family
exit

Host B has the exact same configuration except the router ID and the neighbor address.

On host A, we have 192.0.2.46/28 in VRFA (containing a bridge with the address and a VXLAN interface enslaved). On host B, we have 192.0.2.62/28 in VRFB (containing a bridge with the address and a VXLAN interface enslaved). You can see the full setup here: https://github.com/vincentbernat/network-lab/blob/master/lab-frr-evpn-vrf/setup.

On host A, I see:

R1# show ip route vrf VRFA 192.0.2.48/28
Routing entry for 192.0.2.48/28
  Known via "bgp", distance 200, metric 0, vrf VRFA, best
  Last update 00:06:08 ago
  * 203.0.113.2, via bridgeA onlink, weight 1

I would have expected to use bridgeB as the nexthop, because 192.0.2.48/28 has VNI 80002:

R1# show ip bgp vrf VRFA 192.0.2.48/28
BGP routing table entry for 192.0.2.48/28, version 4
Paths: (1 available, best #1, vrf VRFA)
  Not advertised to any peer
  Imported from 1.1.1.2:3:[5]:[0]:[28]:[192.0.2.48], VNI 80002
  Local
    203.0.113.2(R2) from R2(203.0.113.2) (1.1.1.2) announce-nh-self
      Origin incomplete, metric 0, localpref 100, valid, internal, bestpath-from-AS Local, best (First path received
)
      Extended Community: RT:64600:80002 ET:8 Rmac:50:54:33:00:00:04
      Last update: Tue Aug 22 18:16:10 2023

This also happens for L2VNI (but I don't have a minimal reproduction). I am not sure if this is expected to work (but in this case, I don't see a use for import/export RT).

Versions

cc @ton31337 (not the bug I talked about you, this one is gone in 8.5.1, so I didn't investigate).

ton31337 commented 1 year ago

@chiragshah6 @sworleys @taspelund EVPN experts any idea here?

taspelund commented 1 year ago

This looks to me like a Downstream-allocated VNI (D-VNI) setup (trying to install a route via the VNI in the route rather than via the L3VNI locally configured for the VRF) but you're using Traditional VXLAN Devices rather than a Single VXLAN Device.

@sworleys do you have an example iproute2 config they can try? i.e. the proper combo of ip link add vx0 type vxlan external vnifilter ... and bridge vni add ... (and possibly bridge vlan tunnel ...) commands

vincentbernat commented 1 year ago

Not a lot of documentation about single VXLAN devices, except Cumulus and ifupdown2, but I only see this used in conjunction with VLAN-aware bridges. No word on VRF, but maybe I need one VXLAN device per-VRF?

taspelund commented 1 year ago

Aside from Cumulus and ifupdown2 docs, I don't really have any places to point you to for the linux interfaces needed to make EVPN go... I wrote the FRR doc for traditional bridges and traditional vxlan devices, but I haven't expanded them to cover vlan-aware bridges and single vxlan devices yet.

At a high level, to setup an SVD for EVPN you need to:

  1. create/configure your vlan-aware bridge (example below shows a vlan-aware bridge plus an SVI for vlan 10 and a bridge-port carrying vlan 10)
  2. create a vxlan device with "external" and "vnifilter" attributes (making it a "single vxlan device" (for multiple VNIs))
  3. add/allow local VNIs (L2 and L3) to the vxlan device
  4. put the SVD in the bridge
  5. enable vlan_tunnel on the SVD
  6. add vlan_filtering db entries to permit your bridge VLANs through the SVD
  7. add vlan tunnel_info entries to map VLAN:VNI

Example commands:

  1. ip link add br0 type bridge vlan_filtering 1; ip link add name br0.10 link br0 type vlan id 10 protocol 802.1q; bridge vlan add vid 10 dev br0 self; ip link set eth0 master br0; bridge vlan add dev eth0 vid 10 master
  2. ip link add vx0 type vxlan external vnifilter local <source-ip> dstport 4789 nolearning
  3. bridge vni add dev vx0 vni 200010
  4. ip link set vx0 master br0
  5. ip link set vx0 type bridge_slave vlan_tunnel on
  6. bridge vlan add dev vx0 vid 10 master
  7. bridge vlan add dev vx0 vid 10 tunnel_info id 200010 master

That would be a good example for an L2VNI, since the VLAN is being extended to a physical/host-facing port (eth0). For an L3VNI you'd want to create a VRF device, set that VRF as the SVI's master device, and follow the same process as above minus adding physical ports to the corresponding VLAN (the vxlan device should be the only bridge-port carrying the L3VNI VLAN).

e.g.

  1. ip link add TENANT1 type vrf id 2222
  2. ip link add name br0.4000 link br0 type vlan id 4000 protocol 802.1q; bridge vlan add vid 4000 dev br0 self; ip link set br0.4000 master TENANT1
  3. bridge vni add dev vx0 vni 204000
  4. bridge vlan add dev vx0 vid 4000 master
  5. bridge vlan add dev vx0 vid 4000 tunnel_info id 204000 master
taspelund commented 1 year ago

No word on VRF, but maybe I need one VXLAN device per-VRF?

Currently FRR treats all VXLAN devices only as L2 interfaces, so they would decap and bridge traffic to a local bridge-port or SVI. Conceptually you could use a VXLAN device as an L3 interface, but FRR doesn't have code to support that yet. Today for an L3VNI you would need an SVI for the VRF that lives in the L3VNI VLAN (in the above example: VRF=TENANT1, L3VNI=204000, L3VNI-VLAN=4000, L3-SVI=br0.4000). Also remember to configure vrf TENANT1 + vni 204000 in FRR so it knows this VNI should be L3 instead of L2.

vincentbernat commented 1 year ago

Thanks for the answer. I am now using this setup: https://github.com/vincentbernat/network-lab/blob/master/lab-frr-evpn-vrf/setup.

From FRR point of view, everything looks OK:

R2# show ip route vrf vrf2
Codes: K - kernel route, C - connected, S - static, R - RIP,
       O - OSPF, I - IS-IS, B - BGP, E - EIGRP, N - NHRP,
       T - Table, v - VNC, V - VNC-Direct, A - Babel, F - PBR,
       f - OpenFabric,
       > - selected route, * - FIB route, q - queued, r - rejected, b - backup
       t - trapped, o - offload failure

VRF vrf2:
B>* 10.0.10.0/24 [200/0] via 100.64.0.1, vx0 (vrf default) onlink, label 100, weight 1, 00:03:08
B>* 10.0.10.1/32 [200/0] via 100.64.0.1, vx0 (vrf default) onlink, label 100, weight 1, 00:03:08
C>* 10.0.20.0/24 is directly connected, l2vni220, 00:03:10

10.0.10.0/24 which originated in vrf1 on first host is available in vrf2 in second host, with the right label (100). On the kernel side, it seems to be correctly translated:

10.0.10.0/24 nhid 29  encap ip id 100 src 0.0.0.0 dst 100.64.0.1 ttl 0 tos 0 via 100.64.0.1 dev vx0 proto bgp metric 20 onlink

And in practice, this works!