FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.17k stars 1.22k forks source link

FRR doesn't associate ESI with remote VTEP #15094

Closed Sushi1324 closed 1 month ago

Sushi1324 commented 7 months ago

Describe the bug

This appears to specifically be an issue with compatibility between Juniper EVPN-VXLAN and FRR EVPN. I have Juniper QFX5120 switches in a leaf and spine design configured as the core for my VXLAN underlay to route between several Proxmox Hypervisors running FRR to serve as the EVPN control plane. In order to provide an uplink out of the Proxmox cloud, I have enabled my two Juniper Spine switches as VTEPs and configured ESI-LAG to an upstream router.

FRR is learning the MAC associations for the upstream device (20:ed:47:98:5a:40) correctly from the 2 Spines with the ESI I have configured on my Juniper switches:

# show evpn mac vni 101
Number of MACs (local and remote) known for this VNI: 3
Flags: N=sync-neighs, I=local-inactive, P=peer-active, X=peer-proxy
MAC               Type   Flags Intf/Remote ES/VTEP            VLAN  Seq #'s
bc:24:11:a5:00:d1 remote       100.120.1.51                         0/0   ## Guest on another Proxmox host
bc:24:11:df:f3:84 local        fwpr102p0                            0/0   ## Guest on local Proxmox host
20:ed:47:98:5a:40 remote       00:11:00:00:00:01:06:51:20:0a        0/0   ## Upstream Router connected to ESI-LAG at Spine

My local guest behind the VXLAN VTEP on my Hypervisor can ping the Upstream router, but gets DUP ICMP warnings. When I investigated further, it appears that while FRR learns the correct ESI for the LAG configured by Juniper, it is not creating an association between the ESI and any remote VTEPs and is instead flooding all VTEPs with any traffic destined to the routers MAC address. This means that the Host is sending a VXLAN encapsulated ICMP request message to both Spine switches which are both being forwarded to the upstream router, which then replies to both, hence the DUP ICMP warnings.

I can see that the Type 4 routes from Juniper advertise that this ESI should be associated with the VTEP IPs of both spine switches:

# show bgp l2vpn evpn route type 4
BGP table version is 10, local router ID is 100.120.1.10
Status codes: s suppressed, d damped, h history, * valid, > best, i - internal
Origin codes: i - IGP, e - EGP, ? - incomplete
EVPN type-1 prefix: [1]:[EthTag]:[ESI]:[IPlen]:[VTEP-IP]:[Frag-id]
EVPN type-2 prefix: [2]:[EthTag]:[MAClen]:[MAC]:[IPlen]:[IP]
EVPN type-3 prefix: [3]:[EthTag]:[IPlen]:[OrigIP]
EVPN type-4 prefix: [4]:[ESI]:[IPlen]:[OrigIP]
EVPN type-5 prefix: [5]:[EthTag]:[IPlen]:[IP]

   Network          Next Hop            Metric LocPrf Weight Path
                    Extended Community
Route Distinguisher: 100.120.0.20:0
* i[4]:[00:11:00:00:00:01:06:51:20:0a]:[32]:[100.120.0.20]
                    100.120.0.20                  100      0 i
                    ET:8 ES-Import-Rt:11:00:00:00:01:06
*>i[4]:[00:11:00:00:00:01:06:51:20:0a]:[32]:[100.120.0.20]
                    100.120.0.20                  100      0 i
                    ET:8 ES-Import-Rt:11:00:00:00:01:06
Route Distinguisher: 100.120.0.21:0
* i[4]:[00:11:00:00:00:01:06:51:20:0a]:[32]:[100.120.0.21]
                    100.120.0.21                  100      0 i
                    ET:8 ES-Import-Rt:11:00:00:00:01:06
*>i[4]:[00:11:00:00:00:01:06:51:20:0a]:[32]:[100.120.0.21]
                    100.120.0.21                  100      0 i
                    ET:8 ES-Import-Rt:11:00:00:00:01:06

But when I view the ESI there are no associated VTEPs:

# show evpn es detail
ESI: 00:11:00:00:00:01:06:51:20:0a
 Type:
 Interface: -
 Ready for BGP: no
 VNI Count: 0
 MAC Count: 1
 DF preference: 0
 Nexthop group: 536870913
 VTEPs:

The Nexthop group displayed doesn't seem to exist either from what I can find but I'm likely not looking in the correct place.

Is there any reason that FRR would be ignoring the Type 4 routes that indicate which VTEPs are associated with a particular ESI? I know examples I've seen with FRR for configuring ESI-LAG the ESI shows Type information and should know the remote VTEP IPs to use, so I believe if I can fix the ESI to VTEP associations this should work properly.

To Reproduce

Configure a standard EVPN implementation on FRR. My configuration is generated via Proxmox tooling but should be fairly standard, using an ISIS core to advertise/learn loopback IPs to use as VTEP endpoints with my two Juniper Spine switches (100.120.0.20+21) acting as route reflectors:

vrf vrf_rwevpn
 vni 15999900
exit-vrf
!
interface dummy0
 ip router isis EVPN-CORE
exit
!
interface ens5f0np0
 ip router isis EVPN-CORE
exit
!
interface ens5f1np1
 ip router isis EVPN-CORE
exit
!
router bgp 65120
 bgp router-id 100.120.1.10
 no bgp hard-administrative-reset
 no bgp default ipv4-unicast
 coalesce-time 1000
 no bgp graceful-restart notification
 neighbor VTEP peer-group
 neighbor VTEP remote-as 65120
 neighbor VTEP bfd
 neighbor VTEP update-source dummy0
 neighbor 100.120.0.20 peer-group VTEP
 neighbor 100.120.0.21 peer-group VTEP
 !
 address-family ipv4 unicast
  import vrf vrf_rwevpn
 exit-address-family
 !
 address-family ipv6 unicast
  import vrf vrf_rwevpn
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor VTEP activate
  neighbor VTEP route-map MAP_VTEP_IN in
  neighbor VTEP route-map MAP_VTEP_OUT out
  advertise-all-vni
 exit-address-family
exit
!
router bgp 65120 vrf vrf_rwevpn
 bgp router-id 100.120.1.10
 no bgp hard-administrative-reset
 no bgp graceful-restart notification
 !
 address-family ipv4 unicast
  redistribute connected
 exit-address-family
 !
 address-family ipv6 unicast
  redistribute connected
 exit-address-family
 !
 address-family l2vpn evpn
  default-originate ipv4
  default-originate ipv6
 exit-address-family
exit
!
router isis EVPN-CORE
 net 49.0001.1001.2000.1010.00
 redistribute ipv4 connected level-1
 redistribute ipv6 connected level-1
 log-adjacency-changes
exit
!
route-map MAP_VTEP_IN deny 1
 match evpn route-type prefix
 match evpn vni 15999900
exit
!
route-map MAP_VTEP_IN permit 2
exit
!
route-map MAP_VTEP_OUT permit 1
exit
!
end

Then on two Juniper switches, configure them as remote VTEPs, then add an LACP Bond with the same ESI and System ID configured on both devices. In my case no additional BGP sessions are needed as I am using the Spine/Route Reflectors as the VTEPs/ESI-LAG switches. Here is the relevant configuration on one of the Juniper Spines:

interfaces {
    et-0/0/0 {
        ether-options {
            802.3ad ae1;
        }
    }
    ae1 {
        mtu 9216;
        esi {
            00:11:00:00:00:01:06:51:20:0a;
            all-active;
        }
        aggregated-ether-options {
            minimum-links 1;
            link-speed 100g;
            lacp {
                active;
                system-id 00:00:06:51:20:0a;
                admin-key 1;
            }
        }
        unit 0 {
            family ethernet-switching {
                interface-mode trunk;
                vlan {
                    members 101;
                }
            }
        }
    }
    lo0 {
        description "Loopback (lo0)";
        unit 0 {
            family inet {
                no-redirects;
                address 100.120.0.20/32;
            }
            family iso {
                address 49.0001.1001.2000.0020.00;
            }
        }
    }
}

protocols {
    bgp {
        group ibgp_rw {
            type internal;
            local-address 100.120.0.20;
            family evpn {
                signaling;
            }
            cluster 100.120.0.254;
            multipath;
            bfd-liveness-detection {
                minimum-interval 350;
                multiplier 3;
                session-mode automatic;
            }
            neighbor 100.120.1.10 {
                description "Proxmox Host A"; 
            }
            neighbor 100.120.1.51 {
                description "Proxmox Host B";
            }
        }
        local-as 65120;
        graceful-restart;
    }
    evpn {
        encapsulation vxlan;
        vni-options {
            vni 101 {
                vrf-target target:65120:101;
            }
        }
        extended-vni-list all;
    }
}
switch-options {
    vtep-source-interface lo0.0;
    route-distinguisher 100.120.0.20:3;
    vrf-target target:65120:3;
}
vlans {
    v101 {
        vlan-id 101;
        vxlan {
            vni 101;
        }
    }
}

The second switch configured as part of the ESI lag looks nearly identical besides device specific IPs

Expected behavior I expect the FRR control plan to associate the ESI learned through BGP with the 2 specific VTEP Endpoints I have configured that are advertised via Type 4 routes. Then traffic destined for the ESI is sent to 1 VTEP endpoint or another rather than flooding all VTEP endpoints on the particular VNI, resulting in duplicated packets.

Versions

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

frrbot[bot] commented 1 month ago

This issue will be automatically closed in the specified period unless there is further activity.