FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.33k stars 1.25k forks source link

STALE Arp entry is withdrawn from Type2 advertised routes when EVPN MH is enabled. #12574

Closed RupeshPatro closed 1 year ago

RupeshPatro commented 1 year ago

Hello - I notice a behavior that with evpn and no multihoming, if an ARP entry is present in 'ip neigh' table (even though it is STALE) it is advertised to the peer evpn bgp neighbors. This behavior is expected. Then, I add MH configs on top of it, I notice that as soon as an entry goes STALE, the corresponding Type-2 is withdrawal is advertised, which is not expected (pls correct me if I am wrong here). Could you please evaluate if this is a bug or a problem in my set up?

I have tried adjusting some knobs like mac-hold time and neigh-hold time that seemed like the frr would try establish connectivity before withdrawing, but it wasn't the case.

I am providing the running config and the ip configuration below. The topology is just 2 routers talking evpn bgp.

(Host 172.16.0.10) --- [FRR1] <--evpn bgp> [FRR2] --- (Host 172.16.0.20)


FRR1 running config: Current configuration: ! frr version 8.4.1 frr defaults traditional hostname esx-1-fdca log syslog informational no ipv6 forwarding evpn mh mac-holdtime 3600 evpn mh neigh-holdtime 3600 evpn mh redirect-off service integrated-vtysh-config ! interface mhbond evpn mh es-id 10 evpn mh es-sys-mac 00:0c:29:49:dc:91 evpn mh uplink exit ! router bgp 4210000051 bgp router-id 100.73.243.165 no bgp ebgp-requires-policy bgp default show-hostname bgp default show-nexthop-hostname neighbor to_fdc peer-group neighbor to_fdc remote-as 4210000051 neighbor to_fds peer-group neighbor to_fds remote-as 4210000051 neighbor 100.73.243.166 peer-group to_fdc neighbor 100.73.243.177 peer-group to_fdc neighbor 100.73.243.196 peer-group to_fds ! address-family ipv4 unicast no neighbor to_fdc activate no neighbor to_fds activate exit-address-family ! address-family l2vpn evpn neighbor to_fdc activate advertise-all-vni exit-address-family exit ! ip nht resolve-via-default ! end

Routes advertised by FRR1 show bgp l2vpn evpn neighbors 100.73.243.177 advertised-routes BGP table version is 0, local router ID is 100.73.243.165 Default local pref 100, local AS 4210000051 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path Route Distinguisher: 100.73.243.165:2 > [1]:[0]:[03:00:0c:29:49:dc:91:00:00:0a]:[128]:[::]:[0] 100 32768 i > [2]:[0]:[48]:[00:0c:29:49:dc:92] 100 32768 i > [2]:[0]:[48]:[00:12:34:56:78:9a] 100 32768 i > [3]:[0]:[32]:[100.73.243.165] 100 32768 i Route Distinguisher: 100.73.243.165:3 > [1]:[4294967295]:[03:00:0c:29:49:dc:91:00:00:0a]:[128]:[::]:[0] 100 32768 i > [4]:[03:00:0c:29:49:dc:91:00:00:0a]:[32]:[100.73.243.165] 100 32768 i

FRR1 ubuntu network settings/output: ip neigh 100.73.243.175 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 172.16.0.10 dev br10 lladdr 00:0c:29:49:dc:92 STALE 100.73.243.161 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 172.16.0.20 dev br10 lladdr 00:0c:29:3c:d3:bd extern_learn NOARP proto zebra 100.73.243.177 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.166 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.169 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.178 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.170 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.173 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.174 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.167 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.179 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.171 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT

ip route default via 100.73.243.161 dev ens160 100.73.243.160/27 dev ens160 proto kernel scope link src 100.73.243.165 172.16.0.0/24 dev br10 proto kernel scope link src 172.16.0.1

ip addr ip a 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 00:22:48:02:17:15 brd ff:ff:ff:ff:ff:ff inet 100.73.243.165/27 scope global ens160 valid_lft forever preferred_lft forever inet6 fe80::222:48ff:fe02:1715/64 scope link valid_lft forever preferred_lft forever 3: ens192: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master mhbond state UP group default qlen 1000 link/ether 00:0c:29:fb:c8:e9 brd ff:ff:ff:ff:ff:ff 4: br10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:12:34:56:78:9a brd ff:ff:ff:ff:ff:ff inet 172.16.0.1/24 scope global br10 valid_lft forever preferred_lft forever inet6 fe80::212:34ff:fe56:789a/64 scope link valid_lft forever preferred_lft forever 5: vxlan10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br10 state UNKNOWN group default qlen 1000 link/ether 1a:20:e9:a4:93:01 brd ff:ff:ff:ff:ff:ff inet6 fe80::1820:e9ff:fea4:9301/64 scope link valid_lft forever preferred_lft forever 6: mhbond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master br10 state UP group default qlen 1000 link/ether 00:0c:29:fb:c8:e9 brd ff:ff:ff:ff:ff:ff inet6 fe80::20c:29ff:fefb:c8e9/64 scope link valid_lft forever preferred_lft forever


FRR2 running config Building configuration...

Current configuration: ! frr version 8.4.1 frr defaults traditional hostname esx-4-fdca log syslog informational no ipv6 forwarding evpn mh mac-holdtime 3600 evpn mh neigh-holdtime 3600 evpn mh redirect-off service integrated-vtysh-config ! router bgp 4210000051 bgp router-id 100.73.243.177 no bgp ebgp-requires-policy bgp default show-hostname bgp default show-nexthop-hostname neighbor to_fdc peer-group neighbor to_fdc remote-as 4210000051 neighbor to_fds peer-group neighbor to_fds remote-as 4210000051 neighbor 100.73.243.165 peer-group to_fdc neighbor 100.73.243.166 peer-group to_fdc neighbor 100.73.243.196 peer-group to_fds ! address-family ipv4 unicast no neighbor to_fdc activate no neighbor to_fds activate exit-address-family ! address-family l2vpn evpn neighbor to_fdc activate neighbor to_fds activate advertise-all-vni exit-address-family exit ! ip nht resolve-via-default ! end

Routes advertised by FRR2 show bgp l2vpn evpn neighbors 100.73.243.165 advertised-routes BGP table version is 0, local router ID is 100.73.243.177 Default local pref 100, local AS 4210000051 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path Route Distinguisher: 100.73.243.177:2 > [2]:[0]:[48]:[00:0c:29:3c:d3:bd] 100 32768 i > [2]:[0]:[48]:[00:0c:29:3c:d3:bd]:[32]:[172.16.0.20] 100 32768 i *> [3]:[0]:[32]:[100.73.243.177] 100 32768 i

Total number of prefixes 3

FRR2 ubuntu network settings/output: ip neigh ip neigh 100.73.243.161 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.173 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.175 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.179 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.174 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.178 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.169 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 172.16.0.20 dev br10 lladdr 00:0c:29:3c:d3:bd STALE 100.73.243.171 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.170 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.165 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.167 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT 100.73.243.166 dev ens160 lladdr 12:34:56:78:9a:bc PERMANENT

ip route ip route default via 100.73.243.161 dev ens160 100.73.243.160/27 dev ens160 proto kernel scope link src 100.73.243.177 172.16.0.0/24 dev br10 proto kernel scope link src 172.16.0.1

ip addr ip addr 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: ens160: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000 link/ether 60:45:bd:77:17:24 brd ff:ff:ff:ff:ff:ff inet 100.73.243.177/27 scope global ens160 valid_lft forever preferred_lft forever inet6 fe80::6245:bdff:fe77:1724/64 scope link valid_lft forever preferred_lft forever 3: ens192: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc fq_codel master mhbond state UP group default qlen 1000 link/ether 00:0c:29:22:63:a3 brd ff:ff:ff:ff:ff:ff 4: br10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000 link/ether 00:12:34:56:78:9a brd ff:ff:ff:ff:ff:ff inet 172.16.0.1/24 scope global br10 valid_lft forever preferred_lft forever inet6 fe80::212:34ff:fe56:789a/64 scope link valid_lft forever preferred_lft forever 6: mhbond: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master br10 state UP group default qlen 1000 link/ether 00:0c:29:22:63:a3 brd ff:ff:ff:ff:ff:ff inet6 fe80::20c:29ff:fe22:63a3/64 scope link valid_lft forever preferred_lft forever 7: vxlan10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br10 state UNKNOWN group default qlen 1000 link/ether fa:08:4b:a2:9f:e6 brd ff:ff:ff:ff:ff:ff inet6 fe80::f808:4bff:fea2:9fe6/64 scope link valid_lft forever preferred_lft forever

Lastly, both the FRR routers are running on top of Ubuntu:

lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.5 LTS Release: 20.04 Codename: focal

taspelund commented 1 year ago

@RupeshPatro Can you provide the output of these two commands from both FRR1 and FRR2?

RupeshPatro commented 1 year ago

Here, thanks @taspelund

FRR1 (aka esx-1-fdca): sudo vtysh -c 'show ver' FRRouting 8.4.1 (esx-1-fdca) on Linux(5.4.0-125-generic). Copyright 1996-2005 Kunihiro Ishiguro, et al. configured with: '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--localstatedir=/var/run/frr' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--disable-scripting' '--enable-pim6d' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'

sudo vtysh -c 'show bgp l2vpn evpn route' | egrep -A3 '00:0c:29:49:dc:92|00:0c:29:3c:d3:bd'

> [1]:[0]:[03:00:0c:29:49:dc:92:00:00:0a]:[128]:[::]:[0] 100.73.243.165(esx-1-fdca) 32768 i ET:8 RT:32947:10 > [2]:[0]:[48]:[00:0c:29:49:dc:92] 100.73.243.165(esx-1-fdca) 32768 i ESI:03:00:0c:29:49:dc:92:00:00:0a ET:8 RT:32947:10 *> [3]:[0]:[32]:[100.73.243.165] 100.73.243.165(esx-1-fdca)

--

> [1]:[4294967295]:[03:00:0c:29:49:dc:92:00:00:0a]:[128]:[::]:[0] 100.73.243.165(esx-1-fdca) 32768 i ET:8 ESI-label-Rt:AA RT:32947:10 > [4]:[03:00:0c:29:49:dc:92:00:00:0a]:[32]:[100.73.243.165] 100.73.243.165(esx-1-fdca) 32768 i ET:8 ES-Import-Rt:00:0c:29:49:dc:92 DF: (alg: 2, pref: 32767) Route Distinguisher: 100.73.243.177:2 >i[2]:[0]:[48]:[00:0c:29:3c:d3:bd] 100.73.243.177(esx-4-fdca) 100 0 i RT:32947:10 ET:8 >i[2]:[0]:[48]:[00:0c:29:3c:d3:bd]:[32]:[172.16.0.20] 100.73.243.177(esx-4-fdca) 100 0 i RT:32947:10 ET:8

FRR2 (aka esx-4-fdca): sudo vtysh -c 'show ver'

FRRouting 8.4.1 (esx-4-fdca) on Linux(5.4.0-125-generic). Copyright 1996-2005 Kunihiro Ishiguro, et al. configured with: '--build=x86_64-linux-gnu' '--prefix=/usr' '--includedir=${prefix}/include' '--mandir=${prefix}/share/man' '--infodir=${prefix}/share/info' '--sysconfdir=/etc' '--localstatedir=/var' '--disable-silent-rules' '--libdir=${prefix}/lib/x86_64-linux-gnu' '--libexecdir=${prefix}/lib/x86_64-linux-gnu' '--disable-maintainer-mode' '--localstatedir=/var/run/frr' '--sbindir=/usr/lib/frr' '--sysconfdir=/etc/frr' '--with-vtysh-pager=/usr/bin/pager' '--libdir=/usr/lib/x86_64-linux-gnu/frr' '--with-moduledir=/usr/lib/x86_64-linux-gnu/frr/modules' '--disable-dependency-tracking' '--enable-rpki' '--disable-scripting' '--enable-pim6d' '--with-libpam' '--enable-doc' '--enable-doc-html' '--enable-snmp' '--enable-fpm' '--disable-protobuf' '--disable-zeromq' '--enable-ospfapi' '--enable-bgp-vnc' '--enable-multipath=256' '--enable-user=frr' '--enable-group=frr' '--enable-vty-group=frrvty' '--enable-configfile-mask=0640' '--enable-logfile-mask=0640' 'build_alias=x86_64-linux-gnu' 'PYTHON=python3'

sudo vtysh -c 'show bgp l2vpn evpn route' | egrep -A3 '00:0c:29:49:dc:92|00:0c:29:3c:d3:bd'

*>i[1]:[0]:[03:00:0c:29:49:dc:92:00:00:0a]:[32]:[0.0.0.0]:[0] 100.73.243.165(esx-1-fdca) 100 0 i RT:32947:10 ET:8

--

>i[1]:[4294967295]:[03:00:0c:29:49:dc:92:00:00:0a]:[32]:[0.0.0.0]:[0] 100.73.243.165(esx-1-fdca) 100 0 i RT:32947:10 ET:8 ESI-label-Rt:AA >i[4]:[03:00:0c:29:49:dc:92:00:00:0a]:[32]:[100.73.243.165] 100.73.243.165(esx-1-fdca) 100 0 i ET:8 ES-Import-Rt:00:0c:29:49:dc:92 DF: (alg: 2, pref: 32767) Route Distinguisher: 100.73.243.177:2 > [2]:[0]:[48]:[00:0c:29:3c:d3:bd] 100.73.243.177(esx-4-fdca) 32768 i ET:8 RT:32947:10 > [2]:[0]:[48]:[00:0c:29:3c:d3:bd]:[32]:[172.16.0.20] 100.73.243.177(esx-4-fdca) 32768 i ET:8 RT:32947:10

taspelund commented 1 year ago

Thanks. Can you also get bridge fdb show | egrep '00:0c:29:49:dc:92|00:0c:29:3c:d3:bd' ?

RupeshPatro commented 1 year ago

On FRR1

bridge fdb show | egrep '00:0c:29:49:dc:92|00:0c:29:3c:d3:bd' 00:0c:29:3c:d3:bd dev vxlan10 vlan 1 extern_learn master br10 00:0c:29:3c:d3:bd dev vxlan10 extern_learn master br10 00:0c:29:49:dc:92 dev vxlan10 vlan 1 extern_learn master br10 00:0c:29:49:dc:92 dev vxlan10 dst 100.73.243.166 self extern_learn 00:0c:29:3c:d3:bd dev vxlan10 dst 100.73.243.177 self extern_learn 00:0c:29:49:dc:92 dev mhbond master br10

On FRR2:

bridge fdb show | egrep '00:0c:29:49:dc:92|00:0c:29:3c:d3:bd' 00:0c:29:49:dc:92 dev vxlan10 vlan 1 extern_learn master br10 00:0c:29:49:dc:92 dev vxlan10 extern_learn master br10 00:0c:29:3c:d3:bd dev mhbond master br10

Please also note that after x amount of time, FRR1 stops advertising the Type-2 MAC address too. JFYI as I am not sure about this behavior (it may be expected to withdraw after mac age time out).

Specifically, this entry is Withdrawn: *> [2]:[0]:[48]:[00:0c:29:49:dc:92] 100 32768 i

Then, the advertisements from FRR1 are as follows:

show bgp l2vpn evpn neighbors 100.73.243.177 advertised-routes BGP table version is 0, local router ID is 100.73.243.165 Default local pref 100, local AS 4210000051 Status codes: s suppressed, d damped, h history, * valid, > best, i - internal Origin codes: i - IGP, e - EGP, ? - incomplete

Network Next Hop Metric LocPrf Weight Path Route Distinguisher: 100.73.243.165:2 > [1]:[0]:[03:00:0c:29:49:dc:92:00:00:0a]:[128]:[::]:[0] 100 32768 i > [3]:[0]:[32]:[100.73.243.165] 100 32768 i Route Distinguisher: 100.73.243.165:3 > [1]:[4294967295]:[03:00:0c:29:49:dc:92:00:00:0a]:[128]:[::]:[0] 100 32768 i > [4]:[03:00:0c:29:49:dc:92:00:00:0a]:[32]:[100.73.243.165] 100 32768 i

Total number of prefixes 4

And the 'bridge fdb show' out put are as follows:

FRR1: root@esx-1-fdca:/home/azureuser# bridge fdb show | egrep '00:0c:29:49:dc:92|00:0c:29:3c:d3:bd' 00:0c:29:3c:d3:bd dev vxlan10 vlan 1 extern_learn master br10 00:0c:29:3c:d3:bd dev vxlan10 extern_learn master br10 00:0c:29:49:dc:92 dev vxlan10 vlan 1 extern_learn master br10 00:0c:29:49:dc:92 dev vxlan10 dst 100.73.243.166 self extern_learn 00:0c:29:3c:d3:bd dev vxlan10 dst 100.73.243.177 self extern_learn

FRR2: bridge fdb show | egrep '00:0c:29:49:dc:92|00:0c:29:3c:d3:bd' 00:0c:29:3c:d3:bd dev mhbond master br10

taspelund commented 1 year ago

There needs to be a local fdb entry for the MAC in the ARP entry for FRR to consider the host to be active and valid for advertisement in EVPN.

It sounds like in the state where the MH VTEP has a STALE ARP entry, the corresponding local fdb entry may be missing?

Can you check the fdb/neigh/type-2 status all at the same time to confirm?

gord1306 commented 1 year ago

Are the ARP requests and ARP replies for 172.16.0.10 going through the same path on the mhbond netdev? And I traced the current code https://github.com/FRRouting/frr/commit/c7bfd085680bf94ea5dbdccc875f7e0257a9a9c8, and as per the current design of evpn-mh, after the first ES is created, only reachable neighbor entries are advertised.

Does anyone know the reason for this limitation?

taspelund commented 1 year ago

I wasn't part of any discussion when this was implemented, but I would guess it's in place for the purposes of advertising a host based on local activity.

If the ARP entry hasn't been validated recently (no ARP traffic to/from that IP, entry transitions to NUD_STALE) then we can't advertise that Type-2 as "active". If we advertise it at all then it would have to be proxy-advertised as "inactive" (ND:Proxy ext-comm would be set on the Type-2).

However I would only expect a Type-2 for an "inactive" host to be proxy-advertised if we know about an ES-Peer who is currently advertising it as "active" (no ND:Proxy ext-comm).

Judging from the diagram in this issue, it doesn't seem like there are any ES-Peers for this ESI so I'm guessing we just skip the proxy-advertisement since we don't know of the host being active from ES-Peers.

RupeshPatro commented 1 year ago

That may be correct @taspelund. I added an ES-peer, the set up now looks like this: (Host 172.16.0.10) --- [FRR1-A & FRR1-B] <--evpn bgp> [FRR2] --- (Host 172.16.0.20)

And I now see the STALE ARP entries not getting Withdrawn.

One open question from "However I would only expect a Type-2 for an "inactive" host to be proxy-advertised if we know about an ES-Peer who is currently advertising it as "active" (no ND:Proxy ext-comm)." In the set up I have, 172.16.0.10 is "inactive" on both the peers (that is, the entry is STALE on both FRRs attached to 172.16.0.10).

So, it does not appear that Type-2 is advertised as inactive conditional to its peer advertising as active. Moreover, without multihoming, inactive entries are still advertised.

Happy to share any configuration details you would like to see, if needed.

taspelund commented 1 year ago

As I understand it, the condition you've described is when the mac-holdtime/neigh-holdtime would come into play.

i.e.

Essentially those define the amount of seconds FRR will wait to withdraw an inactive/proxy advertisement if there are no ES-Peers advertising the type-2 without the ND:Proxy extended community.

gord1306 commented 1 year ago

I'm not sure if I understand your problem correctly, @RupeshPatro. Is the Host 172.16.0.10 not supposed to be in the Stale state? If so, try leaving only one host-to-FRR1 link and see if it becomes the Reachable state.

taspelund commented 1 year ago

The kernel will transition neighbors from REACHABLE -> STALE after base_reachable_time_ms expires for that IP. So if there isn't ARP traffic to/from 172.16.0.10 triggering the timer to reset, it is expected for the ARP entry to move to STALE.

In Cumulus Linux there is a python daemon called neighmgrd which will periodically send ARP Requests / IPv6 Neighbor Solicitations for IPs in the ARP/NDP cache, since replies processed by the kernel will trigger base_reachable_time_ms to reset. i.e. an ARP/NDP entry would only transition to STALE if the host isn't replying to those refresh attempts or the reply gets dropped/processed late.

There has been some talk within the FRR community around getting similar functionality added to FRR, but I don't know of anyone who has committed to doing the work for it yet.

RupeshPatro commented 1 year ago

@taspelund - The usage of mac/neigh hold timer makes more sense to me now. I tested it and works as expected. @gord1306 - The host 172.16.0.10 is inactive and is expected to be in STALE state.

My focus at the moment is to learn how EVPN MH implementation in FRR treats silent hosts and looks like neighmgrd or something similar is currently the way to go. Allow me a little more time, I am documenting my observations and will share them soon.

RupeshPatro commented 1 year ago

I think I have settled down on how this works (almost). In summary the settings I have is:

  1. Disable mac learning on vxlan interface
  2. Disable flooding on vxlan interface
  3. Enable neigh_suppress on vxlan interface
  4. In FRR config, enable flooding HER
  5. Have a daemon running on the FRRs such that it will test reachability to locally learnt mac-ip before they go STALE.

If a host already has made its mark in vxlan fabric then the vtep endpoints are aware of it. If another host is trying to ARP the identified host, neigh_suppress kicks in and sends the ARP reply (basically ARP suppression).

For silent hosts, the discovery is an L2 broadcast enabled by HER. It does not look like the vtep does anything with such ARP frames except allowing the transit.

However, the concept of ARP replication seems missing. Meaning, for the silent hosts I mentioned above, the ARP exchanges are at layer 2 purely. But with ARP replication, the vtep endpoints will watch for the ARP frames, and if it finds an ARP reply in transit, it will just make an entry in its local db. Certainly, one can write some sort of automation (maybe ip monitor) to watch for such frames and perform ARP replication. But this will be a good feature to have in FRR as a CLI command.

All that said, I have hit another problem and it is still related to EVPN MH, let me open another issue. Given that there are no real action items for the FRR team, this issue can be closed. I'll wait a little bit more if I have incorrect understanding or there is anymore information you could share I may find useful.

Thanks a lot for all your contributions towards this open source community!

taspelund commented 1 year ago

Your summary looks accurate to me. The "ARP replication" you are referring to is another function of the neighmgrd daemon in Cumulus Linux, which that implementation refers to as "snooping". In CL, neighmgrd installs BPF rules in the kernel to receive a copy of ARP/NDP packets received on host-facing bridge-ports (excludes VXLAN ports), then creates a neigh entry via the corresponding SVI interface (based on VLAN tag) if one is found. There has been discussion in the community around adding neighbor management functionality into FRR, but I don't think anyone has committed to implementing it yet.