FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.39k stars 1.26k forks source link

No way to influence FRR's choice of multihomed gateway peer #14836

Open auranext opened 1 year ago

auranext commented 1 year ago

Hello FRR team and community,

I'm working on a project that will integrate 2 Juniper MX204 routers into our network. Currently, the network is based on VXLAN/EVPN, which only provides L2 bridging between different vendors.

• HP switches • H3C switches • Proxmox/FRR servers • Etc.

The new MX204 will handle

GW for the VM hosted on baremetals => bridged VXLAN via a VXLAN switch GW for the VM hosted on Proxmox/FRR servers => bridged VXLAN via the Hypervisor MX204 implements virtual gateway, so each MX advertise same IP and MAC via BGP/EVPN but each one advertise it s own ESI

Everything is fine but it is not working as expected for the Virtual Gateway FRR choose the MH nexthop based on peer IP numeric comparison, in this condition it can be only one active gw at a time We expect that FRR choose the closest peer automatically or manually playing with OSPF/BGP attributes. We performed a series of tests :

  1. set OSPF degraded cost value on the path of selected GW => no changes
  2. set degraded localpref on received EVPN type2 on FRR selected GW (route-map) => no changes
  3. set degraded weight on received EVPN type2 on FRR selected GW (route-map) => no changes

OSPF cost is also taken into account, verified with "show ip ospf route" but FRR seems to ignore that statement

sample output with modified weight :

ITXPVE09TEST# show ip ospf route
============ OSPF network routing table ============
N    10.255.1.51/32        [12011] area: 0.0.0.0
                           via 50.1.0.2, ens1f0np0
N    10.255.2.51/32        [11011] area: 0.0.0.0
                           via 50.1.0.2, ens1f0np0

BGP localpref and weight are also taken into account, verified with "show bgp l2vpn evpn route vni 29903 json", also ignored by FRR

sample output with modified weight :

  "[2]:[29903]:[48]:[00:00:00:00:00:00]:[32]:[10.20.20.110]":{
    "prefix":"[2]:[29903]:[48]:[00:00:00:00:00:00]:[32]:[10.20.20.110]",
    "prefixLen":352,
    "paths":[
      [
        {
          "valid":true,
          "pathFrom":"internal",
          "routeType":2,
          "ethTag":29903,
          "macLen":48,
          "mac":"00:00:5e:29:99:03",
          "ipLen":32,
          "ip":"10.20.20.110",
          "locPrf":100,
          "weight":2048,
          "peerId":"51.2.0.2",
          "path":"",
          "origin":"IGP",
          "esi":"05:00:00:01:90:00:00:74:cf:00",
          "extendedCommunity":{
            "string":"RT:400:29903 ET:8 MM:0, sticky MAC"
          },
          "nexthops":[
            {
              "ip":"10.255.2.51",
              "afi":"ipv4",
              "used":true
            }
          ]
        }
      ],
      [
        {
          "valid":true,
          "bestpath":true,
          "selectionReason":"EVPN lower IP",
          "pathFrom":"internal",
          "routeType":2,
          "ethTag":29903,
          "macLen":48,
          "mac":"00:00:5e:29:99:03",
          "ipLen":32,
          "ip":"10.20.20.110",
          "locPrf":100,
          "weight":1024,
          "peerId":"51.2.0.2",
          "path":"",
          "origin":"IGP",
          "esi":"05:00:00:01:90:00:00:74:cf:00",
          "extendedCommunity":{
            "string":"RT:400:29903 ET:8 MM:0, sticky MAC"
          },
          "nexthops":[
            {
              "ip":"10.255.1.51",
              "afi":"ipv4",
              "used":true
            }
          ]
        }
      ]
    ]
  },

In every tests we notice that FRR statically keep the selection reason :

"selectionReason":"EVPN lower IP",

FRR version : 9.0.1-0~deb11u1 Linux version : 5.15.116-1-pve

Can you help me on this FRR MH use case ?

Thank you

Maxime

ton31337 commented 1 year ago

I suggest looking at https://github.com/FRRouting/frr/tree/master/tests/topotests/bgp_evpn_mh. Here is an example of a working case EVPN MH. Even more, without the configuration, and topology nobody can't really help you.

auranext commented 8 months ago

Hi, thank you for your reply

Here is the setup topology and configuration detailed

1- toplogy detailed :

the network is based on VXLAN/EVPN, which only provides L2 bridging between different vendors. • HP/H3C switches act as VTEP • HP/H3C routers act as SPINE and Route Reflector • Proxmox/FRR servers act as VTEP • 2x Juniper MX204 act as VTEP

EVPN interop works well for years. for the example I describe a single proxmox node (they all work with the same scheme) Recently introduced the MX provide gateway-ip for VM on PROXMOX in vlxan29903 Junip MX1 underlay IP 10.255.1.51 Junip MX2 underlay IP 10.255.2.51 For redundancy MX brings a new feature: anycast virtual GW So we discovered EVPN "multihomed" and type1 messages The virtual GW has mac:00:00:5e:29:99:03 and ip:10.20.20.110 As mentionned in previous post the aed-ESI is published by MXs FRR received them correctly and add the 2 nexthops and create a NH-grouph in nexthop table.

root@ITXPVE09TEST:~# ip nexthop l
id 268435458 via 10.255.1.51 scope link fdb
id 268435459 via 10.255.2.51 scope link fdb
id 536870913 group 268435458/268435459 fdb

VGA type2 route mentionnes the ESI and iproute FIB contains VGA mac attached to NH-group

root@ITXPVE09TEST:~# bridge fdb | grep 00:00:5e:29:99:03
00:00:5e:29:99:03 dev vxlan29903 sticky master vmbr29903 static
00:00:5e:29:99:03 dev vxlan29903 nhid 536870913 self sticky static

so we have MH VGA active and running , accessible in VNI 29903 The interest of MX VGA is the ability to choose the nearest NH, FRR seems unable to do this and doesn't honor preferences configured manually with OSPF (cost) and BGP attributes (local-pref, weight).

what we expect : FRR prefere nexthop based on administrative distance (OSPF path cost) and honor BGP preferences what we observe : FRR prefere nexthop EVPN lower IP as mentionned in previous post ( "selectionReason":"EVPN lower IP")

2- detailed PROXMOX configuration

#dummy0 ip-unnumbered 
auto ens1f0np0
iface ens1f0np0 inet static
        address 50.1.1.9
        netmask 255.255.255.255
        mtu 9000

auto dummy0
iface dummy0 inet static
        address 50.1.1.9
        netmask 255.255.255.255
        mtu 9000

auto vmbr29903
iface vmbr29903 inet manual
        # TESTING_vxlan29903
        bridge_waitport 0
        bridge_stp off
        bridge_fd 0
        bridge_ports none
        mtu 8950
        post-up /sbin/ip link set arp off dev $IFACE || true

auto vxlan29903
iface vxlan29903 inet manual
        mtu 8950
        pre-up /sbin/ip link add vxlan29903 type vxlan id 29903 dstport 4789 local 50.1.1.9 nolearning dev dummy0 || true
        post-up /sbin/brctl addif vmbr29903 $IFACE || true
        post-up /sbin/bridge link set dev $IFACE learning off || true

OSPF link is up foreach SPINE (there is 4 spines) Here is the FRR configuration for interop correctly we need 'disable-ead-evi-rx' directive because Junos send ead-esi but not ead-evi

`frr version 9.0.1
frr defaults traditional
hostname ITXPVE09TEST
log syslog
no ip forwarding
no ipv6 forwarding
no zebra nexthop kernel enable
bgp no-rib
service integrated-vtysh-config
!
debug zebra events
debug zebra kernel
debug zebra vxlan
debug zebra evpn mh es
debug zebra evpn mh nh
debug zebra evpn mh mac
debug zebra evpn mh neigh
debug ospf zebra
debug ospf event
debug bgp neighbor-events
debug bgp updates in
debug bgp updates out
debug bgp zebra
!
debug route-map
!
interface ens1f0np0
 ip ospf mtu-ignore
 ip ospf network point-to-point
exit
!
interface ens1f1np1
 ip ospf mtu-ignore
 ip ospf network point-to-point
exit
!
router bgp 400
 bgp router-id 50.1.1.9
 bgp log-neighbor-changes
 no bgp default ipv4-unicast
 coalesce-time 1000
 bgp graceful-shutdown
 bgp graceful-restart
 neighbor fabric peer-group
 neighbor fabric remote-as 400
 neighbor fabric update-source 50.1.1.9
 neighbor 50.1.0.1 peer-group fabric
 neighbor 50.1.0.2 peer-group fabric
 neighbor 51.2.0.1 peer-group fabric
 neighbor 51.2.0.2 peer-group fabric
 !
 address-family l2vpn evpn
  neighbor fabric activate
  neighbor fabric route-map TEST2 in
  advertise-all-vni
  vni 29903
   rd 1:29903
  exit-vni
    no use-es-l3nhg
  disable-ead-evi-rx
 exit-address-family
exit
!
router ospf
 ospf router-id 50.1.1.9
 log-adjacency-changes detail
 network 50.1.1.9/32 area 0
exit
!
route-map TEST2 permit 11
 match ip next-hop address 10.255.1.51
 set local-preference 110
exit
!
route-map TEST2 permit 21
 match ip next-hop address 10.255.2.51
 set local-preference 210
exit

As I say in previous post I'm trying to understand how FRR chooses nexthop in this MH context and why OSPF cost or bgp local-pref settings have no effect on this choice.

thx all ! have a nice day ;)