FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.33k stars 1.25k forks source link

7.2 BGP session problem intermitent #5667

Closed gondimcodes closed 4 years ago

gondimcodes commented 4 years ago

Hi,

I am using Debian Buster 10.2 with FRR repository: deb https://deb.frrouting.org/frr buster frr-stable

My Kernel version: Linux frr-lg 4.19.0-6-amd64 #1 SMP Debian 4.19.67-2+deb10u2 (2019-11-11) x86_64 GNU/Linux

dpkg -l | grep frr

ii frr 7.2-1~deb10u1 amd64 FRRouting suite of internet protocols (BGP, OSPF, IS-IS, ...) ii frr-doc 7.2-1~deb10u1 all FRRouting suite - user manual ii frr-pythontools 7.2-1~deb10u1 all FRRouting suite - Python tools ii frr-rpki-rtrlib 7.2-1~deb10u1 amd64 FRRouting suite - BGP RPKI support (rtrlib)

From time to time the BGP session goes down, the settings don't appear in vtysh -c 'show run' and then frr after itself restarts and goes back to work. I just noticed this problem in version 7.2.

My FRR Lookinglass conf:

Building configuration... Current configuration: ! frr version 7.2 frr defaults traditional hostname frr-lg log syslog informational no ip forwarding no ipv6 forwarding rpki rpki polling_period 1000 rpki cache 186.xxx.xxx.x 3323 preference 1 exit service integrated-vtysh-config ! router bgp XXXXX bgp router-id 191.xxx.xxx.241 neighbor lg peer-group neighbor lg remote-as XXXXX neighbor 186.xxx.xxx.1 peer-group lg neighbor 2804:xxxx:xxxx::1 peer-group lg ! address-family ipv4 unicast neighbor lg prefix-list RECEBE-TUDO in neighbor lg prefix-list BLOQUEIA-TUDO out neighbor 186.xxx.xxx.1 soft-reconfiguration inbound exit-address-family ! address-family ipv6 unicast neighbor lg activate neighbor lg prefix-list RECEBE-TUDO-V6 in neighbor lg prefix-list BLOQUEIA-TUDO-V6 out neighbor 2804:xxxx:xxxx::1 soft-reconfiguration inbound exit-address-family ! ip prefix-list BLOQUEIA-TUDO seq 5 deny any ip prefix-list RECEBE-TUDO seq 5 permit any ! ipv6 prefix-list BLOQUEIA-TUDO-V6 seq 5 deny any ipv6 prefix-list RECEBE-TUDO-V6 seq 5 permit any ! line vty ! end

In my logs I noticed the following:

Jan 13 05:06:20 frr-lg bgpd[57765]: Received signal 11 at 1578902780 (si_addr 0x2, PC 0x561d906435d3); aborting... Jan 13 05:06:20 frr-lg bgpd[57765]: Backtrace for 11 stack frames: Jan 13 05:06:20 frr-lg bgpd[57765]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x60) [0x7fee62dd5a60] Jan 13 05:06:20 frr-lg bgpd[57765]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0x10c) [0x7fee62dd5edc]e62dd5a60] Jan 13 05:06:20 frr-lg bgpd[57765]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x70f04) [0x7fee62df5f04]62dd5edc]e62dd5a60] Jan 13 05:06:20 frr-lg bgpd[57765]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7fee62ae6730]f04]62dd5edc]e62dd5a60] Jan 13 05:06:20 frr-lg bgpd[57765]: /usr/lib/frr/bgpd(bgp_table_range_lookup+0x63) [0x561d906435d3]]f04]62dd5edc]e62dd5a60] Jan 13 05:06:20 frr-lg bgpd[57765]: /usr/lib/x86_64-linux-gnu/frr/modules/bgpd_rpki.so(+0x58a3) [0x7fee62d7e8a3]]e62dd5a60] Jan 13 05:06:20 frr-lg bgpd[57765]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x56) [0x7fee62e03476]]e62dd5a60] Jan 13 05:06:20 frr-lg bgpd[57765]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xd8) [0x7fee62dd3c88]476]]e62dd5a60] Jan 13 05:06:20 frr-lg bgpd[57765]: /usr/lib/frr/bgpd(main+0x2f0) [0x561d905eeb60]run+0xd8) [0x7fee62dd3c88]476]]e62dd5a60] Jan 13 05:06:20 frr-lg bgpd[57765]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7fee6293709b]476]]e62dd5a60] Jan 13 05:06:20 frr-lg bgpd[57765]: /usr/lib/frr/bgpd(_start+0x2a) [0x561d905f03aa]in+0xeb) [0x7fee6293709b]476]]e62dd5a60] Jan 13 05:06:20 frr-lg bgpd[57765]: in thread bgpd_sync_callback scheduled from bgpd/bgp_rpki.c:351#012435d3); aborting... Jan 13 05:06:20 frr-lg zebra[753]: [EC 4043309116] Client 'bgp' encountered an error and is shutting down. Jan 13 05:06:20 frr-lg watchfrr[736]: [EC 268435457] bgpd state -> down : read returned EOF Jan 13 05:06:20 frr-lg zebra[753]: [EC 4043309116] Client 'vnc' encountered an error and is shutting down. Jan 13 05:06:20 frr-lg zebra[753]: zebra/zebra_ptm.c:1345 failed to find process pid registration Jan 13 05:06:21 frr-lg zebra[753]: client 15 disconnected. 880885 bgp routes removed from the rib Jan 13 05:06:21 frr-lg zebra[753]: client 25 disconnected. 0 vnc routes removed from the rib Jan 13 05:06:25 frr-lg watchfrr[736]: [EC 100663303] Forked background command [pid 5071]: /usr/lib/frr/watchfrr.sh restart bgpd Jan 13 05:06:25 frr-lg zebra[753]: client 15 says hello and bids fair to announce only bgp routes vrf=0 Jan 13 05:06:25 frr-lg zebra[753]: client 27 says hello and bids fair to announce only vnc routes vrf=0 Jan 13 05:06:27 frr-lg watchfrr[736]: bgpd state -> up : connect succeeded Jan 13 05:06:28 frr-lg bgpd[5079]: [EC 33554503] 186.xxx.xxx.1 unrecognized capability code: 71 - ignored Jan 13 05:06:28 frr-lg bgpd[5079]: [EC 33554503] 2804:xxxx:xxxx::1 unrecognized capability code: 71 - ignored Jan 13 05:06:31 frr-lg bgpd[5079]: %NOTIFICATION: rcvd End-of-RIB for IPv6 Unicast from 2804:xxxx:xxxx::1 in vrf default Jan 13 05:06:41 frr-lg bgpd[5079]: %NOTIFICATION: rcvd End-of-RIB for IPv4 Unicast from 186.xxx.xxx.1 in vrf default Jan 13 08:12:23 frr-lg bgpd[5079]: Received signal 11 at 1578913943 (si_addr 0x2, PC 0x5627f415c5d3); aborting... Jan 13 08:12:23 frr-lg bgpd[5079]: Backtrace for 11 stack frames: Jan 13 08:12:23 frr-lg bgpd[5079]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x60) [0x7fbf664b6a60] Jan 13 08:12:23 frr-lg bgpd[5079]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0x10c) [0x7fbf664b6edc]f664b6a60] Jan 13 08:12:23 frr-lg bgpd[5079]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0x70f04) [0x7fbf664d6f04]664b6edc]f664b6a60] Jan 13 08:12:23 frr-lg bgpd[5079]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x12730) [0x7fbf661c7730]f04]664b6edc]f664b6a60] Jan 13 08:12:23 frr-lg bgpd[5079]: /usr/lib/frr/bgpd(bgp_table_range_lookup+0x63) [0x5627f415c5d3]]f04]664b6edc]f664b6a60] Jan 13 08:12:23 frr-lg bgpd[5079]: /usr/lib/x86_64-linux-gnu/frr/modules/bgpd_rpki.so(+0x58a3) [0x7fbf6645f8a3]]f664b6a60] Jan 13 08:12:23 frr-lg bgpd[5079]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x56) [0x7fbf664e4476]]f664b6a60] Jan 13 08:12:23 frr-lg bgpd[5079]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xd8) [0x7fbf664b4c88]476]]f664b6a60] Jan 13 08:12:23 frr-lg bgpd[5079]: /usr/lib/frr/bgpd(main+0x2f0) [0x5627f4107b60]run+0xd8) [0x7fbf664b4c88]476]]f664b6a60] Jan 13 08:12:23 frr-lg bgpd[5079]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xeb) [0x7fbf6601809b]476]]f664b6a60] Jan 13 08:12:23 frr-lg bgpd[5079]: /usr/lib/frr/bgpd(_start+0x2a) [0x5627f41093aa]in+0xeb) [0x7fbf6601809b]476]]f664b6a60] Jan 13 08:12:23 frr-lg bgpd[5079]: in thread bgpd_sync_callback scheduled from bgpd/bgp_rpki.c:351#0125c5d3); aborting... Jan 13 08:12:23 frr-lg watchfrr[736]: [EC 268435457] bgpd state -> down : read returned EOF Jan 13 08:12:23 frr-lg zebra[753]: [EC 4043309116] Client 'bgp' encountered an error and is shutting down. Jan 13 08:12:23 frr-lg zebra[753]: [EC 4043309116] Client 'vnc' encountered an error and is shutting down. Jan 13 08:12:23 frr-lg zebra[753]: zebra/zebra_ptm.c:1345 failed to find process pid registration Jan 13 08:12:24 frr-lg zebra[753]: client 15 disconnected. 880972 bgp routes removed from the rib Jan 13 08:12:24 frr-lg zebra[753]: client 27 disconnected. 0 vnc routes removed from the rib Jan 13 08:12:28 frr-lg watchfrr[736]: [EC 100663303] Forked background command [pid 9995]: /usr/lib/frr/watchfrr.sh restart bgpd Jan 13 08:12:28 frr-lg zebra[753]: client 15 says hello and bids fair to announce only bgp routes vrf=0 Jan 13 08:12:28 frr-lg zebra[753]: client 27 says hello and bids fair to announce only vnc routes vrf=0 Jan 13 08:12:30 frr-lg watchfrr[736]: bgpd state -> up : connect succeeded Jan 13 08:12:31 frr-lg bgpd[10003]: [EC 33554503] 186.xxx.xxx.1 unrecognized capability code: 71 - ignored Jan 13 08:12:31 frr-lg bgpd[10003]: [EC 33554503] 2804:xxxx:xxxx::1 unrecognized capability code: 71 - ignored Jan 13 08:12:34 frr-lg bgpd[10003]: %NOTIFICATION: rcvd End-of-RIB for IPv6 Unicast from 2804:xxxx:xxxx::1 in vrf default Jan 13 08:12:44 frr-lg bgpd[10003]: %NOTIFICATION: rcvd End-of-RIB for IPv4 Unicast from 186.xxx.xxx.1 in vrf default

Sorry but I'm not a developer, just sysadmin. But needing more information, just ask.

ton31337 commented 4 years ago

@tuxfrw this was fixed with https://github.com/FRRouting/frr/commit/5911f65c7bcb05ee81a744bdc8eec5bdae54a591, but not in the repo yet. 7.2.1 will be released soon, afaik.

gondimcodes commented 4 years ago

Thanks Ton. Waiting for the fix in the repository. :)

ton31337 commented 4 years ago

Closing this, because it's fixed already in stable/7.2 branch. But for a deb package, we need to wait for 7.2.1.