FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.28k stars 1.24k forks source link

multiple rmac for the same vtep when a remote host reboots #16837

Open zstas opened 1 week ago

zstas commented 1 week ago

Description

I have a network with EVPN/VXLAN with FRR 8.5.4 and only RT5 NLRIs. Each host has a session to 2 RRs. Sometimes some hosts needs to be restarted and a MAC on vxlan interface gets changed after a reboot. Quite often I observe that old RMAC still persists.

I observe multiple RMACs both in vtysh:

# vtysh -c 'show evpn rmac vni 100' | grep 10.10.46.76
9a:e7:7f:48:a5:ac 10.10.46.76          
b6:24:b9:27:d3:9d 10.10.46.76          

and linux:

# ip -B -n dataplane neigh show dev vxlan100 | grep b6:24:b9:27:d3:9d
lladdr b6:24:b9:27:d3:9d extern_learn  REACHABLE
lladdr b6:24:b9:27:d3:9d extern_learn  REACHABLE
10.10.46.76 lladdr b6:24:b9:27:d3:9d extern_learn  REACHABLE
# ip -B -n dataplane neigh show dev vxlan100 | grep 9a:e7:7f:48:a5:ac
lladdr 9a:e7:7f:48:a5:ac extern_learn  REACHABLE
lladdr 9a:e7:7f:48:a5:ac extern_learn  REACHABLE
10.10.46.76 lladdr 9a:e7:7f:48:a5:ac extern_learn  REACHABLE

however this command has the correct output (right mac address)

# show evpn next-hops vni 100 ip  10.10.46.76
Ip: 10.10.46.76
  RMAC: b6:24:b9:27:d3:9d
  Refcount: 8
  Prefixes:
    172.20.78.0/24
    172.29.3.32/28
    172.29.6.64/28
    172.29.8.112/28
    172.29.10.112/28
    172.29.2.144/28
    172.29.0.208/28
    172.29.2.240/28

I don't have NLRIs with the old rmac:

# vtysh -c "show bgp l2vpn evpn" | grep 9a:e7:7f:48:a5:ac
# vtysh -c "show bgp l2vpn evpn" | grep b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d
                    RT:65000:100 ET:8 Rmac:b6:24:b9:27:d3:9d

Version

# show version 
FRRouting 8.5.4 (c395-r106-u2.da6.host.46labs) on Linux(5.15.0-89-generic).
Copyright 1996-2005 Kunihiro Ishiguro, et al.
configured with:
    '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-option-checking' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--localstatedir=/var/run/frr' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--disable-scripting' '--enable-pim6d' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'

How to reproduce

  1. Have a network topology where hosts have a BGP EVPN sessions towards RRs and also have a ECMP towards these RRs (and other hosts, not sure if this is necessary)
  2. Reboot 1 host so mac address on vxlan interface in linux gets changed
  3. Check that an older rmac was properly removed from both zebra and linux

Expected behavior

Both RMAC entry in vtysh and linux should be gone after a remote hosts changes its MAC.

Actual behavior

I see both new and old RMACs

Additional context

No response

Checklist

zstas commented 1 week ago

I've search for an existing solution (commits, PRs, issues) but haven't found one. Please point me to it if it exists.