FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.33k stars 1.25k forks source link

EVPN traffic not re-routed when breaking the active link #14355

Closed sbrs3 closed 1 year ago

sbrs3 commented 1 year ago

We are implementing L3 VPN instances with EVPN and VXLAN. eBGP sessions terminate on physical interfaces. VXLAN tunnels terminate on router loopbacks. Loopbacks are advertised through BGP.

See topology.jpg

In this scenario, a link break brings down EVPN traffic, even though there is a remaining path in the topology to reach the destination loopback. The destination loopback can be pinged, but ping through the VPN fails.

Breaking the link between site3-3 and site3-4 (10.21.33.1 is a loopback in the yellow VRF on the site3-3 router): VRF traffic stops working although the remote VTEP loopback is still reachable through the backup path.

root@site3-4:~# ip vrf exec yellow ping 10.21.33.1
PING 10.21.33.1 (10.21.33.1) 56(84) bytes of data.
From 10.21.34.1 icmp_seq=58 Destination Host Unreachable

root@site3-4:~# ping -I 10.0.3.4 10.0.3.3
PING 10.0.3.3 (10.0.3.3) from 10.0.3.4 : 56(84) bytes of data.
64 bytes from 10.0.3.3: icmp_seq=1 ttl=63 time=1.34 ms

VPN and GRD routes look good (see routes.txt).

It appears that FRR removes the neighbor entry for the remote loopback 10.0.3.3 when the link goes down, even though the loopback is still reachable via site3-5.

root@site3-4:~# ip neigh show
192.168.122.1 dev enp1s0 lladdr 52:54:00:05:6b:fa REACHABLE
10.0.3.3 dev br-red lladdr 02:77:bd:6a:03:a5 extern_learn  NOARP proto zebra
192.168.245.2 dev enp3s0 lladdr 52:54:00:d6:a6:96 REACHABLE
10.0.3.3 dev br-yellow lladdr 02:cf:ce:74:7b:73 extern_learn  NOARP proto zebra
192.168.234.1 dev enp2s0 lladdr 52:54:00:70:5a:18 REACHABLE

root@site3-4:~# ip neigh show
192.168.122.1 dev enp1s0 lladdr 52:54:00:05:6b:fa REACHABLE
192.168.245.2 dev enp3s0 lladdr 52:54:00:d6:a6:96 REACHABLE
10.0.3.3 dev br-yellow  INCOMPLETE

clear bgp * brings back the VPN traffic, but it should not be needed.

Versions

sbrs3 commented 1 year ago

topology site3-5-config.txt logs.txt routes.txt site3-3-config.txt site3-4-config.txt

sbrs3 commented 1 year ago

I am wondering why FRR removes the Linux neighbor entry for the remote VTEP when the direct link goes down, even though there is still a second path in the topology to reach that remote VTEP (loopback; 10.0.3.3)?

This is what it looks like in the logs:

2023-09-05 13:14:03.479 [DEBG] zebra: [YK42S-VD2K1] Rx RTM_DELNEIGH family ipv4 IF br-yellow(11) vrf yellow(9) IP 10.0.3.3
2023-09-05 13:14:03.479 [DEBG] zebra: [NR6MZ-KY8YF] zebra neigh del if br-yellow/11 10.0.3.3

The same also happens in a topology with only two routers connected by two redundant links. When one of the two links goes down, the neighbor entry for the remote VTEP (loopback) is removed. Shouldn't the entry remain there as long as there is a path left to reach it?

Any comments?

sbrs3 commented 1 year ago

How can we proceed with this?

chdxD1 commented 1 year ago

This is probably a duplicate of https://github.com/FRRouting/frr/issues/12391, the MR got backported to 8.5 as well, can you try using 8.5.3 instead of 8.5.2 (or 9.0.1 which has the fix as well)?

sbrs3 commented 1 year ago

Re-tested with 8.5.3 and it works. Problem solved, apparently.