FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.34k stars 1.25k forks source link

EVPN-MH - sync neighbors rejected by netlink due to reserved bit set #15253

Closed zzachattack2 closed 2 months ago

zzachattack2 commented 9 months ago

Describe the bug When running EVPN-MH, synced neighbor updates across PEs on the shared segment fail to install, with netlink errors logged indicating that the update messages have a reserved bit set.

To Reproduce

EVPN config parts:

vni 100
evpn mh redirect-off
!
interface bond2
 evpn mh es-df-pref 20000
 evpn mh es-id 2
 evpn mh es-sys-mac fe:64:00:08:7b:00
exit
!
router bgp 65000
 bgp router-id 10.0.0.3
 bgp log-neighbor-changes
 no bgp ebgp-requires-policy
 no bgp default ipv4-unicast
 no bgp network import-check
 timers bgp 10 30
 neighbor 10.0.0.4 remote-as 65000
 neighbor 10.0.0.4 update-source 10.0.0.3
 !
 address-family ipv4 unicast
  redistribute connected
  redistribute static
 exit-address-family
 !
 address-family l2vpn evpn
  neighbor 10.0.0.4 activate
  neighbor 10.0.0.4 soft-reconfiguration inbound
  advertise-all-vni
  advertise-svi-ip
  advertise ipv4 unicast
 exit-address-family
exit

Logs from PE:

Jan 29 22:20:45 zebra[1609]: [KKAC1-JMWTB] Rx RTM_NEWNEIGH family ipv4 IF br1.30(14) vrf default(0) IP 10.30.0.13 MAC 30:09:f9:7c:57:65 state 0x4 flags 0x0 ext_flags 0x0
Jan 29 22:20:45 zebra[1609]: [SGBRA-T9E0Z] zebra neigh add if br1.30/14 10.30.0.13 30:09:f9:7c:57:65
Jan 29 22:20:45 zebra[1609]: [J1Q9Y-TFAYN] Add/Update neighbor 10.30.0.13 MAC 30:09:f9:7c:57:65 intf br1.30(14) state 0x4 local_inactive -> L2-VNI 30
Jan 29 22:20:45 zebra[1609]: [R4ZM8-61M7K] local neigh vni 30 ip 10.30.0.13 mac 30:09:f9:7c:57:65 f 0x301 local-inactive old_bgp_ready flag-update
Jan 29 22:20:45 zebra[1609]: [TGHB8-E7PD0] zebra_evpn_local_neigh_update: dp-install sync-neigh vni 30 ip 10.30.0.13 mac 30:09:f9:7c:57:65 if br1.30(14) f 0x301 static inactive
Jan 29 22:20:45 zebra[1609]: [W8V7C-6W4DS] init neigh ctx NEIGH_INSTALL: ifp br1.30, mac 30:09:f9:7c:57:65, ip 10.30.0.13
Jan 29 22:20:45 zebra[1609]: [JGWSB-SMNVE] dplane: incoming new work counter: 1
Jan 29 22:20:45 zebra[1609]: [Q52A7-211QJ] dplane enqueues 1 new work to provider 'Kernel'
Jan 29 22:20:45 zebra[1609]: [JVY1P-93VFY] dplane provider 'Kernel': processing
Jan 29 22:20:45 zebra[1609]: [XQFEV-ACXXW] Dplane NEIGH_INSTALL, ip 10.30.0.13, ifindex 14
Jan 29 22:20:45 zebra[1609]: [NH6N7-54CD1] Tx RTM_NEWNEIGH family ipv4 IF br1.30(14) Neigh 10.30.0.13 MAC 30:09:f9:7c:57:65 flags 0x0 state 0x4 ext ext_flags 0x2
Jan 29 22:20:45 zebra[1609]: [HYEHE-CQZ9G] nl_batch_send: netlink-dp (NS 0), batch size=64, msg cnt=1
Jan 29 22:20:45 zebra[1609]: [TJ327-ET8HE] netlink_send_msg: >> netlink message dump [sent]
Jan 29 22:20:45 zebra[1609]: [JAS4D-NCWGP] nlmsghdr [len=64 type=(28) NEWNEIGH flags=(0x0501) {REQUEST,DUMP,(ROOT|REPLACE|CAPPED),(ATOMIC|CREATE)} seq=226 pid=3046404233]
Jan 29 22:20:45 zebra[1609]: [T4YQJ-83R8H]   ndm [family=2 (AF_INET) ifindex=14 state=0x0004 {STALE} flags=0x0000 {} type=1 (UNICAST)]
Jan 29 22:20:45 zebra[1609]: [KFBSR-XYJV1]     rta [len=5 (payload=1) type=(12) UNKNOWN]
Jan 29 22:20:45 zebra[1609]: [KFBSR-XYJV1]     rta [len=10 (payload=6) type=(2) LLADDR]
Jan 29 22:20:45 zebra[1609]: [V74GD-NYS6Y]       30:09:F9:7C:57:65
Jan 29 22:20:45 zebra[1609]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(15) UNKNOWN]
Jan 29 22:20:45 zebra[1609]: [KFBSR-XYJV1]     rta [len=8 (payload=4) type=(1) DST]
Jan 29 22:20:45 zebra[1609]: [M8QV4-KY9C0]       10.30.0.13
Jan 29 22:20:45 zebra[1609]: [V8KNF-8EXH8] netlink_recv_msg: << netlink message dump [recv]
Jan 29 22:20:45 zebra[1609]: [JAS4D-NCWGP] nlmsghdr [len=68 type=(2) ERROR flags=(0x0300) {DUMP,(ROOT|REPLACE|CAPPED),(MATCH|EXCLUDE|ACK_TLVS)} seq=226 pid=3046404233]
Jan 29 22:20:45 zebra[1609]: [KWP1C-6CSXF]   nlmsgerr [error=(-22) Invalid argument]
Jan 29 22:20:45 zebra[1609]: [HSYZM-HV7HF] Extended Error: reserved bit set
Jan 29 22:20:45 zebra[1609]: [WVJCK-PPMGD][EC 4043309093] netlink-dp (NS 0) error: Invalid argument, type=RTM_NEWNEIGH(28), seq=226, pid=3046404233

Expected behavior sync-neighbor updates properly installed.

Versions

Additional context

After debugging this a bit, it appears that netlink is taking issue with the "static" flag set on neighbor updates synced on the shared ES (static flag gets mapped to the NTF_E_MH_PEER_SYNC extended flag in the FRR source, but is assigned as NTF_EXT_LOCKED upstream). The error seems to be triggered by a failure of the update to pass validation of an NLA attribute mask.

github-actions[bot] commented 3 months ago

This issue is stale because it has been open 180 days with no activity. Comment or remove the autoclose label in order to avoid having this issue closed.

frrbot[bot] commented 3 months ago

This issue will be automatically closed in the specified period unless there is further activity.