FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.33k stars 1.25k forks source link

EVPN routes are getting mangled after awhile #1565

Closed devicenull closed 3 years ago

devicenull commented 6 years ago

So, we have the following route:

BGP routing table entry for [2]:[0]:[0]:[48]:[5a:00:01:4e:d4:e5]
Paths: (2 available, best #2)
  Not advertised to any peer
  Route [2]:[0]:[0]:[48]:[5a:00:01:4e:d4:e5] VNI 70699
  Imported from c.c.c.c:1:[2]:[0]:[0]:[48]:[5a:00:01:4e:d4:e5]
  65534 64515
    f.f.f.f from d.d.d.d (b.b.b.b)
      Origin IGP, localpref 100, valid, external
      Extended Community: RT:64515:70699 ET:8
      AddPath ID: RX 0, TX 43143555
      Last update: Mon Dec 18 16:08:36 2017

  Route [2]:[0]:[0]:[48]:[5a:00:01:4e:d4:e5] VNI 70699
  Imported from c.c.c.c:1:[2]:[0]:[0]:[48]:[5a:00:01:4e:d4:e5]
  65534 64515
    f.f.f.f from e.e.e.e (a.a.a.a)
      Origin IGP, localpref 100, valid, external, best
      Extended Community: RT:64515:70699 ET:8
      AddPath ID: RX 0, TX 43142875
      Last update: Mon Dec 18 16:08:35 2017

Displayed 2 paths for requested prefix

Most of the time, this works properly, and we get the following added to the forwarding table:

5a:00:01:4e:d4:e5 dev 57cf22469ee9a vlan 1 master br57cf22469ee9a
5a:00:01:4e:d4:e5 dev 57cf22469ee9a dst f.f.f.f self offload

However, something seems to be triggering the removal of half of the entry:

# /opt/iproute2/sbin/bridge fdb | grep 5a:00:01:4e:d4:e5
5a:00:01:4e:d4:e5 dev 57cf22469ee9a vlan 1 master br57cf22469ee9a

But then it sometimes gets readded all by itself. Alternatively, flapping the BGP sessions will cause the route to get readded (but that's obviously not really a solution).

I enabled debug zebra vxlan, and these appear to be the relevant lines:

2017/12/18 16:49:22 ZEBRA: Recv MACIP Add MAC 5a:00:01:4e:d4:e5 IP  VNI 70699 Remote VTEP f.f.f.f from bgp
2017/12/18 16:49:23 ZEBRA: Recv MACIP Del MAC 5a:00:01:4e:d4:e5 IP  VNI 70699 Remote VTEP f.f.f.f from bgp
2017/12/18 16:49:25 ZEBRA: Recv MACIP Add MAC 5a:00:01:4e:d4:e5 IP  VNI 70699 Remote VTEP f.f.f.f from bgp
2017/12/18 16:54:24 ZEBRA: Del remote MAC 5a:00:01:4e:d4:e5 intf 57cf22469ee9a(11948) VNI 70699 - readd
2017/12/18 16:59:25 ZEBRA: Del remote MAC 5a:00:01:4e:d4:e5 intf 57cf22469ee9a(11948) VNI 70699 - readd

The fdb entry was missing after that 2017/12/18 16:54:24 log entry, then present after that 2017/12/18 16:59:25 one.

The machine where that 5a:00:01:4e:d4:e5 is attached to doesn't appear to be flapping the route at all:

# sh bgp l2vpn evpn route vni 70699 mac 5a:00:01:4e:d4:e5
BGP routing table entry for [2]:[0]:[0]:[48]:[5a:00:01:4e:d4:e5]
Paths: (1 available, best #1)
  Not advertised to any peer
  Route [2]:[0]:[0]:[48]:[5a:00:01:4e:d4:e5] VNI 70699
  Local
    f.f.f.f from 0.0.0.0 (g.g.g.g)
      Origin IGP, localpref 100, weight 32768, valid, sourced, local, best
      Extended Community: ET:8 RT:64515:70699
      AddPath ID: RX 0, TX 45796554
      Last update: Mon Dec 18 15:55:58 2017

Displayed 1 paths for requested prefix
# /opt/iproute2/sbin/bridge fdb | grep 5a:00:01:4e:d4:e5
5a:00:01:4e:d4:e5 dev vnet0 vlan 1 master br57cf22469ee9a

I'm not really sure what is removing the offload entry. I'm running this on 4.8.7-1.el6.elrepo.x86_64, and frr-3.1 built from HEAD on 2017-11-07 (sorry, I don't have the exact commit hash available)

devicenull commented 6 years ago

I had 'debug zebra events', 'debug zebra kernel', 'debug zebra vxlan' enabled. This is what I saw before the fdb entry went missing:

2017/12/19 16:12:06 ZEBRA: Del remote MAC 5a:00:01:4e:d4:e5 intf 57cf22469ee9a(11948) VNI 70699 - readd
2017/12/19 16:12:06 ZEBRA: Tx RTM_NEWNEIGH family bridge IF 57cf22469ee9a(11948) VLAN 1 MAC 5a:00:01:4e:d4:e5 dst f.f.f.f
2017/12/19 16:12:06 ZEBRA: netlink_talk: netlink-cmd (NS 0) type RTM_NEWNEIGH(28), len=64 seq=17504 flags 0x505
2017/12/19 16:12:06 ZEBRA: netlink_parse_info: netlink-cmd (NS 0) ACK: type=RTM_NEWNEIGH(28), seq=17504, pid=2783237318
2017/12/19 16:12:06 ZEBRA: netlink_parse_info: netlink-listen (NS 0) type RTM_DELNEIGH(29), len=76, seq=0, pid=0

I also had ip mon going, and saw this:

Deleted ??? dev 57cf22469ee9a lladdr 5a:00:01:4e:d4:e5 STALE

But, I never saw any readd entries like it did earlier:

Deleted ??? dev 57cf22469ee9a lladdr 5a:00:01:4e:d4:e5 STALE
Deleted dev 57cf22469ee9a lladdr 5a:00:01:4e:d4:e5 STALE
dev 57cf22469ee9a lladdr 5a:00:01:4e:d4:e5 REACHABLE
dev 57cf22469ee9a lladdr 5a:00:01:4e:d4:e5 REACHABLE
devicenull commented 6 years ago

I think the issue is frr not understanding whatever these "Deleted ???" messages mean.

I had ip mon running:

 2017-12-19 17:05:05 Deleted ??? dev 57cf22469ee9a lladdr 5a:00:01:4e:d4:e5 STALE

And bridge monitor running:

2017-12-19 17:05:05 Deleted 5a:00:01:4e:d4:e5 dev 57cf22469ee9a dst 172.20.0.249 self offload stale

And that entry was present at 17:04, but missing at 17:05:11:

2017-12-19 17:04:11 5a:00:01:4e:d4:e5 dev 57cf22469ee9a vlan 1 master br57cf22469ee9a
2017-12-19 17:04:11 5a:00:01:4e:d4:e5 dev 57cf22469ee9a dst f.f.f.f self offload
2017-12-19 17:05:11 5a:00:01:4e:d4:e5 dev 57cf22469ee9a vlan 1 master br57cf22469ee9a

However frr's log only indicates an earlier event:

2017/12/19 16:59:45 ZEBRA: Rx RTM_NEWNEIGH family bridge IF 57cf22469ee9a(11948) VLAN 1 MAC 5a:00:01:4e:d4:e5
2017/12/19 17:04:45 ZEBRA: Rx RTM_DELNEIGH family bridge IF 57cf22469ee9a(11948) MAC 5a:00:01:4e:d4:e5 dst f.f.f.f

We don't actually have any idea how to reproduce this outside our environment yet though.

devicenull commented 6 years ago

I ended up just disabling aging here with brctl setageing br57cf22469ee9a 0, which seems to have improved things. I don't really know why we'd want these entries to age out in the first place.

devicenull commented 6 years ago

So, setageing 0 is definitely not correct. Despite what the docs seem to suggest, setting this to 0 results in mac addresses immediately dropping out of the bridge fdb. This essentially turns the bridge into a dumb hub, and causes any traffic to get broadcast out to all the VTEPs

cdwertmann commented 6 years ago

I'm seeing a similar problem using version 3.2+cl3u4 (cumulus build).

EVPN knows that 10.10.0.154 is a peer on VXLAN 140:

# vtysh -c "show evpn mac vni 140"
Number of MACs (local and remote) known for this VNI: 9
MAC               Type   Intf/Remote VTEP      VLAN
...
fa:16:3e:9e:f0:39 remote 10.10.0.154
...

But the output of bridge fdb show dev vxlan-140 does not include any reference to 10.10.0.154. When I restart frr (reload does not help), the missing FDB entries are back:

# bridge fdb show dev vxlan-140 | grep "10.10.0.154"
00:00:00:00:00:00 dst 10.10.0.154 self permanent

I constantly have to restart frr to work around this issue.

devicenull commented 6 years ago

@cdwertmann Are you seeing this on Cumulus hardware? A support case with them would probably get you a quicker resolution. FWIW I found a kernel patch that went in awhile ago by someone from Cumulus that I think was relevant.. I'm afraid I don't have a link to it anymore, but it was related to not aging these routes out.

cdwertmann commented 6 years ago

@devicenull no this is on Ubuntu 16.04 with kernel 4.10.11. It would be fantastic if you could point me towards that kernel patch you mentioned.

From my debugging it looks like FRR is trying to remove an entry from the bridge FDB that is not marked as "permanent", which fails. This leads to other problems with the MAC now existing on multiple VTEPs.

qlyoung commented 3 years ago

@devicenull did this ever get fixed?

devicenull commented 3 years ago

One of the upgrades we did fixed this, but I'm not sure exactly which one