FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.34k stars 1.25k forks source link

eVPN and VRRP race condition #9327

Closed rnurgaliyev closed 1 year ago

rnurgaliyev commented 3 years ago

Describe the bug In this scenario there are three hosts: gw1, gw2 and host1. There is an VXLAN eVPN between these three hosts. Both gateways are route reflectors. There is a VRRP session between both gateways in the same eVPN. VRRP is implemented with a shared MAC address, with macvlan interfaces. Sometimes, when "clear bgp *" is issued on one or both gateways at the same time, host1 has a wrong next-hop in the FIB (bridge fdb show) entry for the VRRP virtual MAC, pointing at the Backup VRRP gateway, while FRR itself has the correct one in RIB (show bgp), pointing at the current VRRP master. I assume there is some race condition, that could be triggered by both gateways being VRRP masters for the short amount of time when eVPN peering is cleared. This does not happen often, but when it does, we need to restart FRR on host1 to install the correct entry in FIB.

Example of the situation when next-hop does not match the "show bgp" output:

host1# bridge fdb show br br_prov_extnet | grep 00:01:02
00:00:5e:00:01:02 dev vxlan_extnet extern_learn master br_prov_extnet
00:00:5e:00:01:02 dev vxlan_extnet dst 10.33.0.12 self extern_learn

host1(frr)# show evpn vni 100
00:00:5e:00:01:02 remote       10.33.0.12                           0/6

host1(frr)# show bgp l2vpn evpn route detail
BGP routing table entry for 10.33.0.1:2:[2]:[0]:[48]:[00:00:5e:00:01:02]
Paths: (2 available, best #2)
  Not advertised to any peer
  Route [2]:[0]:[48]:[00:00:5e:00:01:02] VNI 100
  Local
    10.33.0.1 from 10.33.0.12 (10.33.0.1)
      Origin IGP, metric 0, localpref 100, valid, internal
      Extended Community: RT:65207:100 ET:8 MM:1
      Originator: 10.33.0.1, Cluster list: 10.33.0.12
      Last update: Sun Aug  8 09:54:07 2021
  Route [2]:[0]:[48]:[00:00:5e:00:01:02] VNI 100
  Local
    10.33.0.1 from 10.33.0.1 (10.33.0.1)
      Origin IGP, localpref 100, valid, internal, best (Cluster length)
      Extended Community: RT:65207:100 ET:8 MM:1
      Last update: Sun Aug  8 09:54:07 2021

I attach debug logs from host1. First debug log is taken during "clear bgp " issued on both gateways at the same time, and next-hop ends up being is wrong. Second debug log is the next "clear bgp " on the same hosts several seconds later, which fixes the issue that was introduced by the first "clear bgp *".

fail.txt fix.txt

gw1: 10.33.0.12 gw2: 10.33.0.1

Problematic MAC address: 00:00:5e:00:01:02. It is seen in the log of the first attempt (fail.txt) that the FIB was initially updated with the correct next-hop to 10.33.0.1, but then it was replaced with the wrong entry towards 10.33.0.12 and was never touched again:

--- "clear bgp *" is issued on both gateways ---
Aug 06 14:57:40 host1 zebra[31980]: [MMX22-H2MY5] Tx RTM_DELNEIGH family bridge IF vxlan_extnet(7) MAC 00:00:5e:00:01:02 dst 10.33.0.1 nhg 0 rem
Aug 06 14:57:40 host1 zebra[31980]: [QFR1P-4MVVD] Rx RTM_DELNEIGH AF_BRIDGE IF 7 st 0x2 fl 0x12 MAC 00:00:5e:00:01:02 dst 10.33.0.1 nhg 0

--- BGP sessions come back
Aug 06 14:57:43 host1 zebra[31980]: [MMX22-H2MY5] Tx RTM_NEWNEIGH family bridge IF vxlan_extnet(7) MAC 00:00:5e:00:01:02 dst 10.33.0.12 nhg 0 rem
Aug 06 14:57:43 host1 zebra[31980]: [QFR1P-4MVVD] Rx RTM_NEWNEIGH AF_BRIDGE IF 7 st 0x2 fl 0x12 MAC 00:00:5e:00:01:02 dst 10.33.0.12 nhg 0
Aug 06 14:57:43 host1 zebra[31980]: [MMX22-H2MY5] Tx RTM_NEWNEIGH family bridge IF vxlan_extnet(7) MAC 00:00:5e:00:01:02 dst 10.33.0.1 nhg 0 rem
Aug 06 14:57:43 host1 zebra[31980]: [QFR1P-4MVVD] Rx RTM_NEWNEIGH AF_BRIDGE IF 7 st 0x2 fl 0x12 MAC 00:00:5e:00:01:02 dst 10.33.0.1 nhg 0
Aug 06 14:57:44 host1 zebra[31980]: [MMX22-H2MY5] Tx RTM_DELNEIGH family bridge IF vxlan_extnet(7) MAC 00:00:5e:00:01:02 dst 10.33.0.1 nhg 0 rem
Aug 06 14:57:44 host1 zebra[31980]: [MMX22-H2MY5] Tx RTM_NEWNEIGH family bridge IF vxlan_extnet(7) MAC 00:00:5e:00:01:02 dst 10.33.0.12 nhg 0 rem
Aug 06 14:57:44 host1 zebra[31980]: [MMX22-H2MY5] Tx RTM_NEWNEIGH family bridge IF vxlan_extnet(7) MAC 00:00:5e:00:01:02 dst 10.33.0.12 nhg 0 rem
Aug 06 14:57:44 host1 zebra[31980]: [MMX22-H2MY5] Tx RTM_NEWNEIGH family bridge IF vxlan_extnet(7) MAC 00:00:5e:00:01:02 dst 10.33.0.12 nhg 0 rem
Aug 06 14:57:44 host1 zebra[31980]: [QFR1P-4MVVD] Rx RTM_DELNEIGH AF_BRIDGE IF 7 st 0x2 fl 0x12 MAC 00:00:5e:00:01:02 dst 10.33.0.1 nhg 0
Aug 06 14:57:44 host1 zebra[31980]: [QFR1P-4MVVD] Rx RTM_NEWNEIGH AF_BRIDGE IF 7 st 0x2 fl 0x12 MAC 00:00:5e:00:01:02 dst 10.33.0.12 nhg 0

[x] Did you check if this is a duplicate issue? [ ] Did you test it on the latest FRRouting/frr master branch?

To Reproduce Configure eVPN between 3 nodes, and VRRP with the shared MAC address in this eVPN between two of these nodes. Reset BGP sessions on VRRP nodes with "clear bgp *" several times in a row, until you see that the forwarding entry to the VRRP MAC address in the kernel is pointing at the backup VRRP node on the 3rd node, while "show bgp" in FRR shows the correct next-hop. Restart of FRR on host1 immediately fixes the issue.

Expected behavior Forwarding entry in the kernel matches next-hop in "show bgp" all the time.

Versions FRR versions: tested on 8.0.0 and 7.5.1 Kernel: ubuntu 5.4.0-80 OS: Ubuntu 18.04.5 LTS

taspelund commented 3 years ago

@vivek-cumulus this looks like the issue we worked on recently, where zebra ignores legit updates from BGP because the updated MM counter is lower than zebra's watermark.

Do you recall if that fix has been pushed into upstream yet? Or is that contingent upon the mac/neigh redesign?

ghost commented 3 years ago

Adjusting VRRP times removes the race on "clear bgp *", but the restart of FRR still causes issues. After the restart of the current VRRP master, there is a race again, and almost all eVPN hosts have the wrong forwarding entry for the VRRP MAC. Is there any workaround that I could apply now? @taspelund mentioned something about MM counters, can someone point me at the code so I could try to think of some fix? Thanks!

ghost commented 3 years ago

The issue is reproducible on the latest master.

Aug 16 14:01:01 ybk140917 zebra[22319]: [XAYAY-GEJ4Q] Recv MACIP ADD VNI 100 MAC 00:00:5e:00:01:02 flags 0x0 seq 3 VTEP 10.33.0.1 ESI - from bgp
Aug 16 14:01:01 ybk140917 zebra[22319]: [VQ43C-9BB7Q] rem-macip ignore vni 100 remote-mac 00:00:5e:00:01:02 as existing has higher seq 7 f REM

Correct route is ignored by Zebra because of failed zebra_evpn_mac_is_bgp_seq_ok() check

ghost commented 3 years ago

Disabling this check in zebra_evpn_mac.c helps, but I am still trying to implement a proper fix:

/* When host moves but changes its (MAC,IP)
 * binding, BGP may install a MACIP entry that
 * corresponds to "older" location of the host
 * in transient situations (because {IP1,M1}
 * is a different route from {IP1,M2}). Check
 * the sequence number and ignore this update
 * if appropriate.
 */
if (!zebra_evpn_mac_is_bgp_seq_ok(
        zevpn, mac, seq, ipa_len, ipaddr, false))
    return -1;

Any good ideas are very appreciated :)

taspelund commented 3 years ago

@sworleys and @vivek-cumulus have been working on this exact change internally (Cumulus/NVIDIA) and I think it's almost ready. Can one of you guys chime in here?

sworleys commented 3 years ago

It's not quite ready... basically we are reworking Mac Mobility Handling in EVPN altogether.

ghost commented 3 years ago

Is there any roadmap for this feature? Any target FRR version? Thanks!

sworleys commented 3 years ago

Hard to say, we are re-working MM handling altogether to fix a few bugs in that area including this one. I would suspect one of the next two 8.x releases it will be in. I will close this issue when it does get fixed upstream so you know.

github-actions[bot] commented 1 year ago

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

frrbot[bot] commented 1 year ago

This issue will be automatically closed in the specified period unless there is further activity.

aderumier commented 1 year ago

seem related : https://github.com/FRRouting/frr/pull/12081

sworleys commented 1 year ago

Indeed it is, I did the "fixes" UI thing wrong in github. Should be marked correctly now