Closed mwinter-osr closed 7 years ago
Does this work in master?
I'm having a hard time to track down this one. I'm inclined to say that this is not a bug, but for now I can't say for sure.
The unpredictable test results comes from the fact that, under some circumstances, bgpd will process a TCP SYN packet before a BGP Keepalive that was sent a few milliseconds before. When this happens, bgpd takes a different code path which also closes the redundant TCP connection, but without sending a Cease notification.
In my understanding, the BGP collision detection mechanism was designed to handle the case of two BGP peers establishing an active TCP connection with each other simultaneously, requiring one of them to be torn down. In this test, what ANVL does it to open two active TCP connections to the DUT and ignore its connection requests (TCP SYN). In this case, just by reading the RFC it's unclear to me what a BGP speaker should do:
bgpd uses the second option when the peer FSM state is OpenConfirm or less, and the third option when the peer FSM state is established. ANVL expects the DUT to use the third option regardless of anything, we need to check if this enforcement is correct.
One thing worth noting is that some BGP implementations (e.g. IOS, BIRD, OpenBGPD, GoBGP) deliberately ignore the RFC and don't send a Cease notification when a collision is detected. OpenBGPD in particular seems to solve the problem in a super simple and elegant way: https://github.com/openbsd/src/blob/38d65f3dff/usr.sbin/bgpd/session.c#L1103 https://github.com/openbsd/src/blob/38d65f3dff/usr.sbin/bgpd/session.c#L1010-L1094
Any help from our BGP experts would be appreciated.
During some discussion between many parties involved, the general consensus seemed to be that this is not a bug (which I concur with). The RFC is unclear what to do in order to handle the (very odd) case of a peer connecting to us twice, and the tradeoff between the amount of code required to balance multiple connections within one logical peer and the questionable benefit of doing so is dubious at best.
Closing as a spiritual 'wontfix'.
(This is found with test ANVL-BGPPLUS-27.6 on frr-3.0-rc2)
Reference: RFC 4271, Sect. 6.8, p 35, Connection collision detection
The collision detection in FRR is unreliable and it is occasional possible to open 2 BGP sessions from the same neighbor to FRR. It failed in 1 out of 4 tests for me.
See attached pcap's and bgpd logs of good and bad cases.
BGP_collision_fail.pcap.zip BGP_collision_pass.pcap.zip
bgpd_log_fail.txt bgpd_log_pass.txt