FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.33k stars 1.25k forks source link

BGP neighbour continuously reset? #6464

Closed QiweiWen closed 4 years ago

QiweiWen commented 4 years ago

I am testing a simple BGP use case with a Linux box running FRR 7.3 and a Cisco Nexus switch as the peer. I find that, while the peering succeeds and the network connectivity is up for a time, within a minute FRR will reset the connection with the reason "No AFI/SAFI activated for peer".

Below are the running configs on the FRR side:

Current configuration: ! frr version 7.3 frr defaults traditional hostname sroo3 log file /run/exa/frr.log log syslog service integrated-vtysh-config ! debug bgp neighbor-events ! ip route 10.0.0.4/32 enp3s0 ! router bgp 123 neighbor 192.168.10.2 remote-as 456 ! address-family ipv4 unicast network 10.0.0.4/32 exit-address-family ! line vty ! end

I tried "address-family ipv4 unicast; neighbor 192.168.10.1 activate". Doesn't help, neither does the configuration show up in running configs, indicating that it's perhaps the default.

Below is the output of "show ip bgp neigh"

BGP neighbor is 192.168.10.2, remote AS 456, local AS 123, external link BGP version 4, remote router ID 172.18.0.2, local router ID 192.168.10.1 BGP state = Established, up for 00:00:23 Last read 00:00:22, Last write 00:00:22 Hold time is 180, keepalive interval is 60 seconds Neighbor capabilities: 4 Byte AS: advertised and received AddPath: IPv4 Unicast: RX advertised IPv4 Unicast Dynamic: received Extended nexthop: received Address families by peer: IPv4 Unicast Route refresh: advertised and received(old & new) Address Family IPv4 Unicast: advertised and received Hostname Capability: advertised (name: sroo3,domain name: n/a) not received Graceful Restart Capabilty: advertised and received Remote Restart timer is 120 seconds Address families by peer: IPv4 Unicast(not preserved) Graceful restart information: End-of-RIB send: IPv4 Unicast End-of-RIB received: IPv4 Unicast Message statistics: Inq depth is 0 Outq depth is 0 Sent Rcvd Opens: 57 29 Notifications: 56 0 Updates: 87 58 Keepalives: 29 58 Route Refresh: 0 0 Capability: 0 0 Total: 229 145 Minimum time between advertisement runs is 0 seconds

For address family: IPv4 Unicast Update group 29, subgroup 29 Packet Queue length 0 Community attribute sent to this neighbor(all) 1 accepted prefixes

Connections established 29; dropped 28 Last reset 00:00:34, No AFI/SAFI activated for peer Local host: 192.168.10.1, Local port: 179 Foreign host: 192.168.10.2, Foreign port: 59894 Nexthop: 192.168.10.1 Nexthop global: fe80::ec79:e3ff:fee9:6d07 Nexthop local: fe80::ec79:e3ff:fee9:6d07 BGP connection: shared network BGP Connect Retry Timer in Seconds: 120 Estimated round trip time: 2 ms Read thread: on Write thread: on FD used: 21

QiweiWen commented 4 years ago

sorry for the spam. the root cause is that our system has a background script that continuously performs operations on BGP peers based on system configuration. "no ebgp-multihop" is the one that's causing the reset; although the configuration hasn't really changed, FRR seems to reset the peer regardless.

yswery-reconz commented 3 years ago

@QiweiWen Sorry can you please explain how you fix or identified the solution for your issue. I think I am seeing something very similar on our system where all our session are resetting periodically (at random)

QiweiWen commented 3 years ago

Hi Yif,

Here's a bit more context about our system. We made an Ethernet switch that has FRR as part of the control plane. We rolled out own configuration management and API and a script runs in the background monitoring the configs and sends commands to the vtysh sockets when the configs change. The script wakes up every minute and syncs FRR configs with the switch running configuration also, if no change is detected during that minute.

What caused the neighbour flap was that the script kept sending "no ebgp multi-hop" down the socket during the per-minute callback. FRR 7.3.1 (mistakenly?) treats this command as sufficient ground to delete the peering and start again.

If your neighbours flapping issue is "random", I doubt this is the same issue.

-dave

On Sun, 24 Oct 2021, 10:51 pm Yif Swery, @.***> wrote:

@QiweiWen https://github.com/QiweiWen Sorry can you please explain how you fix or identified the solution for your issue. I think I am seeing something very similar on our system where all our session are resetting periodically (at random)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/FRRouting/frr/issues/6464#issuecomment-950310168, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABPWFXHYFA2TP7CNIPGBD5LUIPXLNANCNFSM4NKE5DCQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.