FRRouting / frr

The FRRouting Protocol Suite
https://frrouting.org/
Other
3.37k stars 1.25k forks source link

bgpd crash #9046

Closed EasyNetDev closed 3 years ago

EasyNetDev commented 3 years ago

Hi,

I notice that my FRR is crashing with these in logs, on both routers:

R01:

Jul 13 23:41:06 R01 BGP[48446]: in thread bgp_process_packet scheduled from bgpd/bgp_io.c:270 bgp_process_reads()
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/frr/bgpd(_start+0x2a) [0x560b2465c08a]
Jul 13 23:41:06 R01 BGP[48446]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7fcdb8c42d0a]
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/frr/bgpd(main+0x38e) [0x560b2465a34e]
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xe8) [0x7fcdb90f15a8]
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x7d) [0x7fcdb91332ad]
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/frr/bgpd(bgp_process_packet+0x466) [0x560b246b1996]
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/frr/bgpd(+0x192c28) [0x560b246aec28]
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/frr/bgpd(bgp_nlri_parse_ip+0xb7) [0x560b246cae27]
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/frr/bgpd(bgp_update+0x1a15) [0x560b246c9bf5]
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/frr/bgpd(bgp_damp_update+0x17c) [0x560b2478e6cc]
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/frr/bgpd(+0x271feb) [0x560b2478dfeb]
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/frr/bgpd(+0x271d16) [0x560b2478dd16]
Jul 13 23:41:06 R01 BGP[48446]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7fcdb8df5140]
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xc5651) [0x7fcdb9122651]
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0xf5) [0x7fcdb90f8635]
Jul 13 23:41:06 R01 BGP[48446]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x6d) [0x7fcdb90f843d]
Jul 13 23:41:06 R01 BGP[48446]: Received signal 11 at 1626208866 (si_addr 0x0, PC 0x560b2478dd16); aborting...

R02:

Jul 13 23:46:13 R02 watchfrr[6260]: [HD38Q-0HBRT][EC 268435457] bgpd state -> down : read returned EOF
Jul 13 23:46:13 R02 BGP[17723]: in thread bgp_process_packet scheduled from bgpd/bgp_packet.c:2676 bgp_process_packet()
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/frr/bgpd(_start+0x2a) [0x5606aebc608a]
Jul 13 23:46:13 R02 BGP[17723]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7fa8bc606d0a]
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/frr/bgpd(main+0x38e) [0x5606aebc434e]
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xe8) [0x7fa8bcab25a8]
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x7d) [0x7fa8bcaf42ad]
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/frr/bgpd(bgp_process_packet+0x466) [0x5606aec1b996]
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/frr/bgpd(+0x192c28) [0x5606aec18c28]
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/frr/bgpd(bgp_nlri_parse_ip+0xb7) [0x5606aec34e27]
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/frr/bgpd(bgp_update+0x1a15) [0x5606aec33bf5]
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/frr/bgpd(bgp_damp_update+0x17c) [0x5606aecf86cc]
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/frr/bgpd(+0x271feb) [0x5606aecf7feb]
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/frr/bgpd(+0x271cfc) [0x5606aecf7cfc]
Jul 13 23:46:13 R02 BGP[17723]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7fa8bc7b9140]
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xc5651) [0x7fa8bcae3651]
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0xf5) [0x7fa8bcab9635]
Jul 13 23:46:13 R02 BGP[17723]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x6d) [0x7fa8bcab943d]
Jul 13 23:46:13 R02 BGP[17723]: Received signal 11 at 1626209173 (si_addr 0x0, PC 0x5606aecf7cfc); aborting...

[X] Did you check if this is a duplicate issue? [X] Did you test it on the latest FRRouting/frr master branch?

Versions

This is the version of FRR:

commit 507559a089a7ba539b90c4bb1cd0410a4b4b1345 (origin/master, origin/HEAD)
Merge: 91b35264c 70d9b134f
Author: Donald Sharp <sharpd@cumulusnetworks.com>
Date:   Mon Jul 12 07:27:12 2021 -0400

    Merge pull request #9027 from ton31337/fix/missing_unlock_bgp_dest

    bgpd: Don't forget bgp_dest_unlock_node for bgp_static_set()
qlyoung commented 3 years ago

Can you provide a pcap of an example session where this occurs?

ton31337 commented 3 years ago

And show running would be useful as well. I assume you have BGP dampening enabled.

EasyNetDev commented 3 years ago

Hi,

Sure. Last night I've compiled 8.0-dev for one of the routers and keep 8.1-dev on other one. Even 8.0-dev is crashing:

Jul 14 03:48:09 R02 BGP[22377]: in thread bgp_process_packet scheduled from bgpd/bgp_io.c:270 bgp_process_reads()
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/frr/bgpd(_start+0x2a) [0x56376c92bdaa]
Jul 14 03:48:09 R02 BGP[22377]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7f71584ecd0a]
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/frr/bgpd(main+0x356) [0x56376c92a136]
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xe8) [0x7f715889e198]
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0xf3) [0x7f71588df023]
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/frr/bgpd(bgp_process_packet+0x466) [0x56376c97edc6]
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/frr/bgpd(+0x12b058) [0x56376c97c058]
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/frr/bgpd(bgp_nlri_parse_ip+0xb7) [0x56376c997207]
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/frr/bgpd(bgp_update+0x1921) [0x56376c996221]
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/frr/bgpd(bgp_damp_update+0x17c) [0x56376ca392ac]
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/frr/bgpd(+0x1e7bcb) [0x56376ca38bcb]
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/frr/bgpd(+0x1e78f6) [0x56376ca388f6]
Jul 14 03:48:09 R02 BGP[22377]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7f715869f140]
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xc2b11) [0x7f71588ceb11]
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0xf5) [0x7f71588a5225]
Jul 14 03:48:09 R02 BGP[22377]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x6d) [0x7f71588a502d]
Jul 14 03:48:09 R02 BGP[22377]: Received signal 11 at 1626223689 (si_addr 0x0, PC 0x56376ca388f6); aborting...

My configs that I'm running: R01: https://nextcloud.easynet.dev/index.php/s/KEQZ39stJ4i6ktb R02: https://nextcloud.easynet.dev/index.php/s/Ao8jJTxWCc6r4kZ

Now I will try to do a tcpdump over my all BGP sessions and I'll post them.

Is ok?

ton31337 commented 3 years ago

Could you disable bgp dampening just to make sure it doesn't crash without that?

ton31337 commented 3 years ago

By the way, would it be possible to get a full coredump?

EasyNetDev commented 3 years ago

@ton31337,

Sure, I've set my router to dump the core. Here are the PCAPs: R01: https://nextcloud.easynet.dev/index.php/s/Zcm6Xbnn9ZsY94Q R02: https://nextcloud.easynet.dev/index.php/s/qm5fLCt7QTwLCef

Crash happend at 09:46:40 for R01 and 09:46:40 for R02, EEST / Bucharest time.

ton31337 commented 3 years ago

Thanks, I'm waiting for coredump, that would be the best thing to figure out what's the problem here.

EasyNetDev commented 3 years ago

Thanks, I'm waiting for coredump, that would be the best thing to figure out what's the problem here.

Yep. I'm waiting for the next crash. I'm also doing tcpdump again for this crash. I set the "core" limit to "unlimited" for BGP process. I hope it will dump the core.

EasyNetDev commented 3 years ago

Could you disable bgp dampening just to make sure it doesn't crash without that?

I will try to disable it after this crash.

EasyNetDev commented 3 years ago

Ok, got the crash and core dump:

Jul 14 13:15:29 R01 BGP[64622]: in thread bgp_process_packet scheduled from bgpd/bgp_packet.c:2676 bgp_process_packet()
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/frr/bgpd(_start+0x2a) [0x55fdef3f308a]
Jul 14 13:15:29 R01 BGP[64622]: /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xea) [0x7f1f43b2ad0a]
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/frr/bgpd(main+0x38e) [0x55fdef3f134e]
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(frr_run+0xe8) [0x7f1f43fd95a8]
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(thread_call+0x7d) [0x7f1f4401b2ad]
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/frr/bgpd(bgp_process_packet+0x466) [0x55fdef448916]
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/frr/bgpd(+0x192ba8) [0x55fdef445ba8]
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/frr/bgpd(bgp_nlri_parse_ip+0xb7) [0x55fdef461da7]
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/frr/bgpd(bgp_update+0x1a15) [0x55fdef460b75]
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/frr/bgpd(bgp_damp_update+0x17c) [0x55fdef5255dc]
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/frr/bgpd(+0x271efb) [0x55fdef524efb]
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/frr/bgpd(+0x271c26) [0x55fdef524c26]
Jul 14 13:15:29 R01 BGP[64622]: /lib/x86_64-linux-gnu/libpthread.so.0(+0x14140) [0x7f1f43cdd140]
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(+0xc5651) [0x7f1f4400a651]
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_signal+0xf5) [0x7f1f43fe0635]
Jul 14 13:15:29 R01 BGP[64622]: /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0(zlog_backtrace_sigsafe+0x6d) [0x7f1f43fe043d]
Jul 14 13:15:29 R01 BGP[64622]: Received signal 11 at 1626257729 (si_addr 0x0, PC 0x55fdef524c26); aborting...

Here is the core dump for R01: https://nextcloud.easynet.dev/index.php/s/PHCqmLXQccxtL78 Here is the pcap for this crash: https://nextcloud.easynet.dev/index.php/s/XxbeSHif5qyp7Gz And here is my frr-8.1-dev build: https://nextcloud.easynet.dev/index.php/s/6c2HcjHYC5tkFaC

I have also the debug symbols installed on my systems.

I disabled the bgp dumpening.

ton31337 commented 3 years ago

Could you run on your machine and paste the output here?:

gdb -batch -ex 'bt full' /usr/lib/frr/bgpd /<path_to_coredump>/core-bgpd-11-115-124-64622-1626257729
EasyNetDev commented 3 years ago

Could you run on your machine and paste the output here?:

gdb -batch -ex 'bt full' /usr/lib/frr/bgpd /<path_to_coredump>/core-bgpd-11-115-124-64622-1626257729

Sure. Here it is:

# gdb -batch -ex 'bt full' /usr/lib/frr/bgpd /opt/coredump/core-bgpd-11-115-124-64622-1626257729
[New LWP 64622]
[New LWP 64623]
[New LWP 64630]
[New LWP 64624]
[New LWP 64625]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/bgpd -d -F traditional -A 127.0.0.1 -M rpki'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f1f43a1d580 (LWP 64622))]
#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:50
        set = {__val = {18446744067266829055, 140733737640664, 167, 0, 0, 0, 7018141387277233769, 8246195854090838116, 7021216768532505455, 139772261448503, 140733737640624, 139772261448442, 3611922223501156384, 3834033537019820080, 8097313801230169649, 8097317594494889842}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#1  0x00007f1f4400a68c in core_handler (signo=11, siginfo=0x7fff2070a5f0, context=<optimized out>) at lib/sigevent.c:262
        pc = 0x55fdef524c26 <bgp_reuselist_del+54>
        sa_default = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 0, sa_restorer = 0x0}
        sigset = {__val = {9216, 0 <repeats 15 times>}}
#2  <signal handler called>
No locals.
#3  0x000055fdef524c26 in bgp_reuselist_del (list=0x55fdf13e8e90, node=0x7fff2070aa70) at bgpd/bgp_damp.c:57
        curelm = 0x55fe240eb2a0
        __func__ = "bgp_reuselist_del"
#4  0x000055fdef524efb in bgp_reuse_list_delete (bdi=<optimized out>, bdc=<optimized out>, bdc=<optimized out>) at bgpd/bgp_damp.c:186
        list = 0x55fdf13e8e90
        rn = 0x55fe329ba7f0
#5  0x000055fdef5255dc in bgp_damp_update (path=path@entry=0x55fe22b4f020, dest=dest@entry=0x55fe22b4ef20, afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_damp.c:423
        t_now = <optimized out>
        bdi = 0x55fe27c03bd0
        status = <optimized out>
        bdc = 0x55fdf11d5910
        __func__ = {<optimized out> <repeats 16 times>}
#6  0x000055fdef460b75 in bgp_update (peer=peer@entry=0x7f1f40c92010, p=p@entry=0x7fff2070ae70, addpath_id=addpath_id@entry=0, attr=0x7fff2070af80, afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST, type=<optimized out>, sub_type=<optimized out>, prd=0x0, label=0x0, num_labels=<optimized out>, soft_reconfig=<optimized out>, evpn=0x0) at bgpd/bgp_route.c:4075
        ret = <optimized out>
        aspath_loop_count = <optimized out>
        dest = 0x55fe22b4ef20
        bgp = 0x55fdf11d3040
        new_attr = {aspath = 0x55fe13e32ab0, community = 0x55fdfa9b75d0, refcnt = 0, flag = 151, nexthop = {s_addr = 4186198876}, med = 0, local_pref = 300, nh_ifindex = 0, origin = 0 '\000', pmsi_tnl_type = PMSI_TNLTYPE_NO_INFO, rmap_change_flags = 0, mp_nexthop_global = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, mp_nexthop_local = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, nh_lla_ifindex = 0, ecommunity = 0x0, ipv6_ecommunity = 0x0, lcommunity = 0x0, cluster1 = 0x0, transit = 0x0, mp_nexthop_global_in = {s_addr = 0}, aggregator_addr = {s_addr = 0}, originator_id = {s_addr = 0}, weight = 0, aggregator_as = 0, mp_nexthop_len = 0 '\000', mp_nexthop_prefer_global = 0 '\000', sticky = 0 '\000', default_gw = 0 '\000', router_flag = 0 '\000', es_flags = 0 '\000', tag = 0, label_index = 4294967295, label = 4294836223, srv6_vpn = 0x0, srv6_l3vpn = 0x0, encap_tunneltype = 0, encap_subtlvs = 0x0, vnc_subtlvs = 0x0, evpn_overlay = {type = OVERLAY_INDEX_TYPE_NONE, eth_s_id = {val = "\000\000\000\000\000\000\000\000\000"}, gw_ip = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}, mm_seqnum = 0, mm_sync_seqnum = 0, rmac = {octet = "\000\000\000\000\000"}, distance = 0 '\000', rmap_table_id = 0, link_bw = 0, esi = {val = "\000\000\000\000\000\000\000\000\000"}, srte_color = 0, df_pref = 0, df_alg = 0 '\000'}
        attr_new = 0x55fe395949a0
        pi = <optimized out>
        new = <optimized out>
        extra = <optimized out>
        reason = <optimized out>
        pfx_buf = "@\255p \377\177\000\000X\255p \377\177\000\000@\255p \377\177\000\000\220o\233\372\375U\000\000\327\017\027K\376U\000\000\b\000\000\000\000\000\000\000\300\000\000\000\000\000\000\000\377\331?\357\375U\000\000\b\000\000\000\000\000\000\000\300\000\000\000\000\000\000\000p\255p \377\177\000\000<\335?\357\375U", '\000' <repeats 18 times>, "\020 \311@\037\177\000\000\020 \311@\037\177\000\000\260\255p \377\177\000\000\034\336?\357\375U\000\000\260\255p \377\177\000\000\000\246\366҅W\202\344\332\017\027K\376U\000\000\200\257p \377\177\000\000\020 \311@\037\177\000\000\327\017\027K"
        connected = 0
        do_loop_check = <optimized out>
        has_valid_label = <optimized out>
        nh_afi = <optimized out>
        pi_type = <optimized out>
        pi_sub_type = <optimized out>
        vnc_implicit_withdraw = <optimized out>
        same_attr = <optimized out>
        __func__ = "bgp_update"
        pfxprint = {<optimized out> <repeats 80 times>}
        label_decoded = <optimized out>
#7  0x000055fdef461da7 in bgp_nlri_parse_ip (peer=peer@entry=0x7f1f40c92010, attr=attr@entry=0x7fff2070af80, packet=0x7fff2070af20) at bgpd/bgp_route.c:5508
        pnt = 0x55fe4b17102f "O\216=\030O\216\062\030O\216\066\030.\023-\030O\216<\025O\216\060\030O\216\070\030O\216\063\030Y\333\r\030\271\022\376\030O\216:\030O\216\067\030O\216\071\030m\243\303\030.\023*\030.\023(\030.\023)\030.\023/\030.\023+\030O\216;\030Y\333\f蒷\267"
        lim = 0x55fe4b171082 "蒷\267"
        p = {family = 2 '\002', prefixlen = 24, u = {prefix = 79 'O', prefix4 = {s_addr = 4034127}, prefix6 = {__in6_u = {__u6_addr8 = "O\216=", '\000' <repeats 12 times>, __u6_addr16 = {36431, 61, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {4034127, 0, 0, 0}}}, lp = {id = {s_addr = 4034127}, adv_router = {s_addr = 0}}, prefix_eth = {octet = "O\216=\000\000"}, val = "O\216=", '\000' <repeats 12 times>, val32 = {4034127, 0, 0, 0}, ptr = 4034127, prefix_evpn = {route_type = 79 'O', u = {_ead_addr = {esi = {val = "\000\000\000\000\000\000\000\000\000"}, eth_tag = 0, ip = {ipa_type = IPADDR_NONE, ip = {addr = 0 '\000', _v4_addr = {s_addr = 0}, _v6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}}, _macip_addr = {eth_tag = 0, ip_prefix_length = 0 '\000', mac = {octet = "\000\000\000\000\000"}, ip = {ipa_type = IPADDR_NONE, ip = {addr = 0 '\000', _v4_addr = {s_addr = 0}, _v6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}}, _imet_addr = {eth_tag = 0, ip_prefix_length = 0 '\000', ip = {ipa_type = IPADDR_NONE, ip = {addr = 0 '\000', _v4_addr = {s_addr = 0}, _v6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}}, _es_addr = {esi = {val = "\000\000\000\000\000\000\000\000\000"}, ip_prefix_length = 0 '\000', ip = {ipa_type = IPADDR_NONE, ip = {addr = 0 '\000', _v4_addr = {s_addr = 0}, _v6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}}, _prefix_addr = {eth_tag = 0, ip_prefix_length = 0 '\000', ip = {ipa_type = IPADDR_NONE, ip = {addr = 0 '\000', _v4_addr = {s_addr = 0}, _v6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}}}}, prefix_flowspec = {family = 79 'O', prefixlen = 61, ptr = 0}}}
        psize = <optimized out>
        ret = <optimized out>
        afi = AFI_IP
        safi = SAFI_UNICAST
        addpath_encoded = 0
        addpath_id = 0
        __func__ = {<optimized out> <repeats 18 times>}
#8  0x000055fdef4450d3 in bgp_nlri_parse (peer=peer@entry=0x7f1f40c92010, attr=attr@entry=0x7fff2070af80, packet=packet@entry=0x7fff2070af20, mp_withdraw=mp_withdraw@entry=0) at bgpd/bgp_packet.c:311
No locals.
#9  0x000055fdef445ba8 in bgp_update_receive (peer=peer@entry=0x7f1f40c92010, size=size@entry=207) at bgpd/bgp_packet.c:1720
        i = 0
        ret = <optimized out>
        nlri_ret = <optimized out>
        end = <optimized out>
        s = <optimized out>
        attr = {aspath = 0x55fe13e32ab0, community = 0x55fdfa9b6f90, refcnt = 0, flag = 135, nexthop = {s_addr = 4186198876}, med = 0, local_pref = 0, nh_ifindex = 0, origin = 0 '\000', pmsi_tnl_type = PMSI_TNLTYPE_NO_INFO, rmap_change_flags = 0, mp_nexthop_global = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, mp_nexthop_local = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, nh_lla_ifindex = 0, ecommunity = 0x0, ipv6_ecommunity = 0x0, lcommunity = 0x0, cluster1 = 0x0, transit = 0x0, mp_nexthop_global_in = {s_addr = 0}, aggregator_addr = {s_addr = 0}, originator_id = {s_addr = 0}, weight = 0, aggregator_as = 0, mp_nexthop_len = 0 '\000', mp_nexthop_prefer_global = 0 '\000', sticky = 0 '\000', default_gw = 0 '\000', router_flag = 0 '\000', es_flags = 0 '\000', tag = 0, label_index = 4294967295, label = 4294836223, srv6_vpn = 0x0, srv6_l3vpn = 0x0, encap_tunneltype = 0, encap_subtlvs = 0x0, vnc_subtlvs = 0x0, evpn_overlay = {type = OVERLAY_INDEX_TYPE_NONE, eth_s_id = {val = "\000\000\000\000\000\000\000\000\000"}, gw_ip = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}, mm_seqnum = 0, mm_sync_seqnum = 0, rmac = {octet = "\000\000\000\000\000"}, distance = 0 '\000', rmap_table_id = 0, link_bw = 0, esi = {val = "\000\000\000\000\000\000\000\000\000"}, srte_color = 0, df_pref = 0, df_alg = 0 '\000'}
        attribute_len = <optimized out>
        update_len = 164
        withdraw_len = 0
        restart = false
        NLRI_UPDATE = NLRI_UPDATE
        NLRI_WITHDRAW = NLRI_WITHDRAW
        NLRI_MP_UPDATE = NLRI_MP_UPDATE
        NLRI_MP_WITHDRAW = NLRI_MP_WITHDRAW
        NLRI_TYPE_MAX = NLRI_TYPE_MAX
        nlris = {{afi = 1, safi = 1 '\001', nlri = 0x55fe4b170fde "\030m\243\306\030.\023.\030\271\022\374\030\271\371\254\030\271\371\255\030m\243\302\030m\243\307\030\271\022\377\030Y\333\n\030Y\333\017\030Y\333\021\030Y\333\b\030Y\333\t\030Y\333\016\030m\243\304\030\227\354\300\030\227\354\305\030O\216\065\030Y\333\v\030O\216>\030O\216=\030O\216\062\030O\216\066\030.\023-\030O\216<\025O\216\060\030O\216\070\030O\216\063\030Y\333\r\030\271\022\376\030O\216:\030O\216\067\030O\216\071\030m\243\303\030.\023*\030.\023(\030.\023)\030.\023/\030.\023+\030O\216;\030Y\333\f蒷\267", length = 164}, {afi = 0, safi = 0 '\000', nlri = 0x0, length = 0}, {afi = 0, safi = 0 '\000', nlri = 0x0, length = 0}, {afi = 0, safi = 0 '\000', nlri = 0x0, length = 0}}
        __func__ = "bgp_update_receive"
        attr_parse_ret = <optimized out>
#10 0x000055fdef448916 in bgp_process_packet (thread=<optimized out>) at bgpd/bgp_packet.c:2585
        type = 2 '\002'
        xref_p_100 = 0x55fdef663f20 <_xref.132>
        size = 207
        notify_data_length = {<optimized out>, <optimized out>}
        _xrefdata = {xref = 0x55fdef663f20 <_xref.132>, uid = "TJQQE-0PPJT\000\000\000\000", hashstr = 0x55fdef577ce8 "%s: BGP NOTIFY receipt failed for peer: %s", hashu32 = {3, 33554456}}
        _xref = {xref = {xrefdata = 0x55fdef6e8ac0 <_xrefdata.123>, type = XREFT_LOGMSG, line = 2598, file = 0x55fdef578159 "bgpd/bgp_packet.c", func = 0x55fdef5784f0 <__func__.134> "bgp_process_packet"}, fmtstring = 0x55fdef577ce8 "%s: BGP NOTIFY receipt failed for peer: %s", priority = 3, ec = 33554456, args = 0x55fdef55688b "__func__, peer->host"}
        _xrefdata = {xref = 0x55fdef663ee0 <_xref.131>, uid = "YWXN7-Q2X5C\000\000\000\000", hashstr = 0x55fdef577d18 "%s: BGP KEEPALIVE receipt failed for peer: %s", hashu32 = {3, 33554457}}
        _xref = {xref = {xrefdata = 0x55fdef6e8b00 <_xrefdata.124>, type = XREFT_LOGMSG, line = 2610, file = 0x55fdef578159 "bgpd/bgp_packet.c", func = 0x55fdef5784f0 <__func__.134> "bgp_process_packet"}, fmtstring = 0x55fdef577d18 "%s: BGP KEEPALIVE receipt failed for peer: %s", priority = 3, ec = 33554457, args = 0x55fdef55688b "__func__, peer->host"}
        xref_p_101 = 0x55fdef663ee0 <_xref.131>
        peer = 0x7f1f40c92010
        rpkt_quanta_old = <optimized out>
        fsm_update_result = <optimized out>
        mprc = <optimized out>
        processed = 0
        __func__ = "bgp_process_packet"
#11 0x00007f1f4401b2ad in thread_call (thread=thread@entry=0x7fff2070b240) at lib/thread.c:1919
        before = {cpu = {tv_sec = 221, tv_nsec = 370082814}, real = {tv_sec = 131400, tv_usec = 630166}}
        after = {cpu = {tv_sec = 221, tv_nsec = 370076628}, real = {tv_sec = 131400, tv_usec = 630160}}
        cputime_enabled_here = true
        walltime = <optimized out>
        cputime = 0
        exp = <optimized out>
        __func__ = {<optimized out> <repeats 12 times>}
#12 0x00007f1f43fd95a8 in frr_run (master=0x55fdf0811100) at lib/libfrr.c:1161
        instanceinfo = '\000' <repeats 63 times>
        __func__ = "frr_run"
        thread = {type = 4 '\004', add_type = 3 '\003', threaditem = {si = {next = 0x0}}, timeritem = {hi = {index = 0}}, ref = 0x7f1f40d33d50, master = 0x55fdf0811100, func = 0x55fdef4484b0 <bgp_process_packet>, arg = 0x7f1f40c92010, u = {val = 0, fd = 0, sands = {tv_sec = 0, tv_usec = 0}}, real = {tv_sec = 131400, tv_usec = 630166}, hist = 0x7f1f3c006e70, yield = 10000, xref = 0x55fdef663de0 <_xref.127>, mtx = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}}
#13 0x000055fdef3f134e in main (argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_main.c:542
        opt = -1
        tmp_port = <optimized out>
        bgp_port = 179
        addresses = 0x55fdf08072a0
        no_fib_flag = <optimized out>
        no_zebra_flag = 0
        skip_runas = 0
        instance = 0
        buffer_size = 65536
        address = <optimized out>
        node = <optimized out>
        __func__ = {<optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>}
        _xref = {xref = {xrefdata = 0x0, type = XREFT_ASSERT, line = 531, file = 0x55fdef52f0bf "bgpd/bgp_main.c", func = 0x55fdef52f5ef <__func__.16> "main"}, expr = 0x55fdef60b7c7 "node", extra = 0x0, args = 0x0}
        xref_p_19 = 0x55fdef64b840 <_xref.20>

Even with no bgp dumpening I still got a crash.

ton31337 commented 3 years ago

Can we have a coredump when it crashes without BGP dampening enabled as well?

EasyNetDev commented 3 years ago

Can we have a coredump when it crashes without BGP dampening enabled as well?

Sure. I'm waiting for it :).

EasyNetDev commented 3 years ago

This is the core dump on R02 with frr 8.0-dev:

config-process-for-coredump  frr-bgp-debug-core-dump
root@R02:/opt/coredump# ./frr-bgp-debug-core-dump
0x0d7310
[New LWP 36681]
[New LWP 36682]
[New LWP 36747]
[New LWP 36684]
[New LWP 36683]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/bgpd -d -F traditional -A 127.0.0.1 -M rpki'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7fb1f57e3c80 (LWP 36681))]
add symbol table from file "/usr/lib/debug/.build-id/c0/419e45584c90e921b0993ea1a5881140442421.debug" at
        .text_addr = 0xd7310
#0  raise (sig=sig@entry=11) at ../sysdeps/unix/sysv/linux/raise.c:50
        set = {__val = {18446744067266829055, 140725959385938, 167, 0, 0, 0, 7018141387277233769, 8246195854090838116, 7021216768532505455, 140402314008215, 140725959385904, 140402314008154, 3611922223501156384, 3834033537019820080, 8315161139553056305, 2914783753315442547}}
        pid = <optimized out>
        tid = <optimized out>
        ret = <optimized out>
#1  0x00007fb1f60fbb4c in core_handler (signo=11, siginfo=0x7ffd50d1e670, context=<optimized out>) at lib/sigevent.c:262
        pc = 0x5577830a38f6 <bgp_reuselist_del+54>
        sa_default = {__sigaction_handler = {sa_handler = 0x0, sa_sigaction = 0x0}, sa_mask = {__val = {0 <repeats 16 times>}}, sa_flags = 0, sa_restorer = 0x0}
        sigset = {__val = {9216, 0 <repeats 15 times>}}
#2  <signal handler called>
No locals.
#3  0x00005577830a38f6 in bgp_reuselist_del (list=0x557784c7ccf8, node=0x7ffd50d1eb10) at bgpd/bgp_damp.c:57
        curelm = 0x5577d024c5f0
        __func__ = "bgp_reuselist_del"
#4  0x00005577830a3bcb in bgp_reuse_list_delete (bdi=<optimized out>, bdc=<optimized out>, bdc=<optimized out>) at bgpd/bgp_damp.c:186
        list = 0x557784c7ccf8
        rn = 0x5577d2c39cc0
#5  0x00005577830a42ac in bgp_damp_update (path=path@entry=0x5577d1049850, dest=dest@entry=0x5577b45ffaa0, afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_damp.c:424
        t_now = <optimized out>
        bdi = 0x557786483280
        status = <optimized out>
        bdc = 0x557784af7c80
        __func__ = {<optimized out> <repeats 16 times>}
#6  0x0000557783001221 in bgp_update (peer=peer@entry=0x7fb1f29d0010, p=p@entry=0x7ffd50d1eed0, addpath_id=addpath_id@entry=0, attr=0x7ffd50d1efe0, afi=afi@entry=AFI_IP, safi=safi@entry=SAFI_UNICAST, type=<optimized out>, sub_type=<optimized out>, prd=0x0, label=0x0, num_labels=<optimized out>, soft_reconfig=<optimized out>, evpn=0x0) at bgpd/bgp_route.c:4083
        ret = <optimized out>
        aspath_loop_count = <optimized out>
        dest = 0x5577b45ffaa0
        bgp = 0x557784af5460
        new_attr = {aspath = 0x5577c9987460, community = 0x557785abd810, refcnt = 0, flag = 32919, nexthop = {s_addr = 801695425}, med = 0, local_pref = 350, nh_ifindex = 0, origin = 0 '\000', pmsi_tnl_type = PMSI_TNLTYPE_NO_INFO, rmap_change_flags = 0, mp_nexthop_global = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, mp_nexthop_local = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, nh_lla_ifindex = 0, ecommunity = 0x5577860be380, ipv6_ecommunity = 0x0, lcommunity = 0x0, cluster1 = 0x0, transit = 0x0, mp_nexthop_global_in = {s_addr = 0}, aggregator_addr = {s_addr = 0}, originator_id = {s_addr = 0}, weight = 0, aggregator_as = 0, mp_nexthop_len = 0 '\000', mp_nexthop_prefer_global = 0 '\000', sticky = 0 '\000', default_gw = 0 '\000', router_flag = 0 '\000', es_flags = 0 '\000', tag = 0, label_index = 4294967295, label = 4294836223, srv6_vpn = 0x0, srv6_l3vpn = 0x0, encap_tunneltype = 0, encap_subtlvs = 0x0, vnc_subtlvs = 0x0, evpn_overlay = {gw_ip = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}, mm_seqnum = 0, mm_sync_seqnum = 0, rmac = {octet = "\000\000\000\000\000"}, distance = 0 '\000', rmap_table_id = 0, link_bw = 0, esi = {val = "\000\000\000\000\000\000\000\000\000"}, srte_color = 0, df_pref = 0, df_alg = 0 '\000'}
        attr_new = 0x5577cb25b250
        pi = <optimized out>
        new = <optimized out>
        extra = <optimized out>
        reason = <optimized out>
        pfx_buf = "\340\355\321P\375\177\000\000\370\355\321P\375\177\000\000\340\355\321P\375\177\000\000\200\343\v\206wU\000\000\332\320\335\354\261\177\000\000\020\000\000\000\000\000\000\000\300\000\000\000\000\000\000\000\066\231\372\202wU\000\000\020\000\000\000\000\000\000\000\300\000\000\000\000\000\000\000\020\356\321P\375\177\000\000\000=\224+(\r\030\315\300\000\000\000\000\000\000\000\340\357\321P\375\177\000\000\020\000\235\362\261\177\000\000\320\360\321P\375\177\000"
        connected = 0
        do_loop_check = <optimized out>
        has_valid_label = <optimized out>
        nh_afi = <optimized out>
        pi_type = <optimized out>
        pi_sub_type = <optimized out>
        vnc_implicit_withdraw = <optimized out>
        same_attr = <optimized out>
        __func__ = "bgp_update"
        pfxprint = {<optimized out> <repeats 80 times>}
        label_decoded = <optimized out>
#7  0x0000557783002207 in bgp_nlri_parse_ip (peer=peer@entry=0x7fb1f29d0010, attr=attr@entry=0x7ffd50d1efe0, packet=0x7ffd50d1ef80) at bgpd/bgp_route.c:5312
        pnt = 0x7fb1ecddd0e6 "\303\"\024"
        lim = 0x7fb1ecddd0e9 ""
        p = {family = 2 '\002', prefixlen = 24, u = {prefix = 195 '\303', prefix4 = {s_addr = 1319619}, prefix6 = {__in6_u = {__u6_addr8 = "\303\"\024", '\000' <repeats 12 times>, __u6_addr16 = {8899, 20, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {1319619, 0, 0, 0}}}, lp = {id = {s_addr = 1319619}, adv_router = {s_addr = 0}}, prefix_eth = {octet = "\303\"\024\000\000"}, val = "\303\"\024", '\000' <repeats 12 times>, val32 = {1319619, 0, 0, 0}, ptr = 1319619, prefix_evpn = {route_type = 195 '\303', u = {_ead_addr = {esi = {val = "\000\000\000\000\000\000\000\000\000"}, eth_tag = 0, ip = {ipa_type = IPADDR_NONE, ip = {addr = 0 '\000', _v4_addr = {s_addr = 0}, _v6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}}, _macip_addr = {eth_tag = 0, ip_prefix_length = 0 '\000', mac = {octet = "\000\000\000\000\000"}, ip = {ipa_type = IPADDR_NONE, ip = {addr = 0 '\000', _v4_addr = {s_addr = 0}, _v6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}}, _imet_addr = {eth_tag = 0, ip_prefix_length = 0 '\000', ip = {ipa_type = IPADDR_NONE, ip = {addr = 0 '\000', _v4_addr = {s_addr = 0}, _v6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}}, _es_addr = {esi = {val = "\000\000\000\000\000\000\000\000\000"}, ip_prefix_length = 0 '\000', ip = {ipa_type = IPADDR_NONE, ip = {addr = 0 '\000', _v4_addr = {s_addr = 0}, _v6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}}, _prefix_addr = {eth_tag = 0, ip_prefix_length = 0 '\000', ip = {ipa_type = IPADDR_NONE, ip = {addr = 0 '\000', _v4_addr = {s_addr = 0}, _v6_addr = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}}}}, prefix_flowspec = {family = 195 '\303', prefixlen = 20, ptr = 0}}}
        psize = <optimized out>
        ret = <optimized out>
        afi = AFI_IP
        safi = SAFI_UNICAST
        addpath_encoded = 0
        addpath_id = 0
        __func__ = {<optimized out> <repeats 18 times>}
#8  0x0000557782fe6583 in bgp_nlri_parse (peer=peer@entry=0x7fb1f29d0010, attr=attr@entry=0x7ffd50d1efe0, packet=packet@entry=0x7ffd50d1ef80, mp_withdraw=mp_withdraw@entry=0) at bgpd/bgp_packet.c:311
No locals.
#9  0x0000557782fe7058 in bgp_update_receive (peer=peer@entry=0x7fb1f29d0010, size=size@entry=86) at bgpd/bgp_packet.c:1720
        i = 0
        ret = <optimized out>
        nlri_ret = <optimized out>
        end = <optimized out>
        s = <optimized out>
        attr = {aspath = 0x5577c9987460, community = 0x5577b751cde0, refcnt = 0, flag = 32903, nexthop = {s_addr = 801695425}, med = 0, local_pref = 0, nh_ifindex = 0, origin = 0 '\000', pmsi_tnl_type = PMSI_TNLTYPE_NO_INFO, rmap_change_flags = 0, mp_nexthop_global = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, mp_nexthop_local = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}, nh_lla_ifindex = 0, ecommunity = 0x5577860be380, ipv6_ecommunity = 0x0, lcommunity = 0x0, cluster1 = 0x0, transit = 0x0, mp_nexthop_global_in = {s_addr = 0}, aggregator_addr = {s_addr = 0}, originator_id = {s_addr = 0}, weight = 0, aggregator_as = 0, mp_nexthop_len = 0 '\000', mp_nexthop_prefer_global = 0 '\000', sticky = 0 '\000', default_gw = 0 '\000', router_flag = 0 '\000', es_flags = 0 '\000', tag = 0, label_index = 4294967295, label = 4294836223, srv6_vpn = 0x0, srv6_l3vpn = 0x0, encap_tunneltype = 0, encap_subtlvs = 0x0, vnc_subtlvs = 0x0, evpn_overlay = {gw_ip = {ipv4 = {s_addr = 0}, ipv6 = {__in6_u = {__u6_addr8 = '\000' <repeats 15 times>, __u6_addr16 = {0, 0, 0, 0, 0, 0, 0, 0}, __u6_addr32 = {0, 0, 0, 0}}}}}, mm_seqnum = 0, mm_sync_seqnum = 0, rmac = {octet = "\000\000\000\000\000"}, distance = 0 '\000', rmap_table_id = 0, link_bw = 0, esi = {val = "\000\000\000\000\000\000\000\000\000"}, srte_color = 0, df_pref = 0, df_alg = 0 '\000'}
        attribute_len = <optimized out>
        update_len = 4
        withdraw_len = 0
        restart = false
        NLRI_UPDATE = NLRI_UPDATE
        NLRI_WITHDRAW = NLRI_WITHDRAW
        NLRI_MP_UPDATE = NLRI_MP_UPDATE
        NLRI_MP_WITHDRAW = NLRI_MP_WITHDRAW
        NLRI_TYPE_MAX = NLRI_TYPE_MAX
        nlris = {{afi = 1, safi = 1 '\001', nlri = 0x7fb1ecddd0e5 "\030\303\"\024", length = 4}, {afi = 0, safi = 0 '\000', nlri = 0x0, length = 0}, {afi = 0, safi = 0 '\000', nlri = 0x0, length = 0}, {afi = 0, safi = 0 '\000', nlri = 0x0, length = 0}}
        __func__ = "bgp_update_receive"
        attr_parse_ret = <optimized out>
#10 0x0000557782fe9dc6 in bgp_process_packet (thread=<optimized out>) at bgpd/bgp_packet.c:2585
        type = 2 '\002'
        xref_p_100 = 0x5577831894a0 <_xref.132>
        size = 86
        notify_data_length = {<optimized out>, <optimized out>}
        _xrefdata = {xref = 0x5577831894a0 <_xref.132>, uid = "TJQQE-0PPJT\000\000\000\000", hashstr = 0x5577830f4b38 "%s: BGP NOTIFY receipt failed for peer: %s", hashu32 = {3, 33554456}}
        _xref = {xref = {xrefdata = 0x5577831dd600 <_xrefdata.123>, type = XREFT_LOGMSG, line = 2598, file = 0x5577830f4fa9 "bgpd/bgp_packet.c", func = 0x5577830f5330 <__func__.134> "bgp_process_packet"}, fmtstring = 0x5577830f4b38 "%s: BGP NOTIFY receipt failed for peer: %s", priority = 3, ec = 33554456, args = 0x5577830d432b "__func__, peer->host"}
        _xrefdata = {xref = 0x557783189460 <_xref.131>, uid = "YWXN7-Q2X5C\000\000\000\000", hashstr = 0x5577830f4b68 "%s: BGP KEEPALIVE receipt failed for peer: %s", hashu32 = {3, 33554457}}
        _xref = {xref = {xrefdata = 0x5577831dd640 <_xrefdata.124>, type = XREFT_LOGMSG, line = 2610, file = 0x5577830f4fa9 "bgpd/bgp_packet.c", func = 0x5577830f5330 <__func__.134> "bgp_process_packet"}, fmtstring = 0x5577830f4b68 "%s: BGP KEEPALIVE receipt failed for peer: %s", priority = 3, ec = 33554457, args = 0x5577830d432b "__func__, peer->host"}
        xref_p_101 = 0x557783189460 <_xref.131>
        peer = 0x7fb1f29d0010
        rpkt_quanta_old = <optimized out>
        fsm_update_result = <optimized out>
        mprc = <optimized out>
        processed = 1
        __func__ = "bgp_process_packet"
#11 0x00007fb1f610c023 in thread_call (thread=thread@entry=0x7ffd50d1f390) at lib/thread.c:1825
        realtime = 93971809340488
        cputime = 140725959390080
        exp = <optimized out>
        helper = 140402314231509
        before = {cpu = {ru_utime = {tv_sec = 152, tv_usec = 33013}, ru_stime = {tv_sec = 42, tv_usec = 513345}, {ru_maxrss = 1650468, __ru_maxrss_word = 1650468}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 401925, __ru_minflt_word = 401925}, {ru_majflt = 11, __ru_majflt_word = 11}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 0, __ru_inblock_word = 0}, {ru_oublock = 88, __ru_oublock_word = 88}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 305086, __ru_nvcsw_word = 305086}, {ru_nivcsw = 706444, __ru_nivcsw_word = 706444}}, real = {tv_sec = 99046, tv_usec = 465597}}
        after = {cpu = {ru_utime = {tv_sec = 0, tv_usec = 0}, ru_stime = {tv_sec = 0, tv_usec = 0}, {ru_maxrss = 0, __ru_maxrss_word = 0}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 0, __ru_minflt_word = 0}, {ru_majflt = 0, __ru_majflt_word = 0}, {ru_nswap = 1, __ru_nswap_word = 1}, {ru_inblock = 0, __ru_inblock_word = 0}, {ru_oublock = 88, __ru_oublock_word = 88}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv = 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 305085, __ru_nvcsw_word = 305085}, {ru_nivcsw = 706444, __ru_nivcsw_word = 706444}}, real = {tv_sec = 99046, tv_usec = -3668167430312280832}}
        __func__ = {<optimized out> <repeats 12 times>}
#12 0x00007fb1f60cb198 in frr_run (master=0x557784508c30) at lib/libfrr.c:1155
        instanceinfo = '\000' <repeats 63 times>
        __func__ = "frr_run"
        thread = {type = 4 '\004', add_type = 3 '\003', threaditem = {si = {next = 0x0}}, timeritem = {hi = {index = 0}}, ref = 0x7fb1f2a71d50, master = 0x557784508c30, func = 0x557782fe9960 <bgp_process_packet>, arg = 0x7fb1f29d0010, u = {val = 0, fd = 0, sands = {tv_sec = 0, tv_usec = 0}}, real = {tv_sec = 99046, tv_usec = 465597}, hist = 0x7fb1ec0035d0, yield = 10000, xref = 0x557783182fc0 <_xref.15>, mtx = pthread_mutex_t = {Type = Normal, Status = Not acquired, Robust = No, Shared = No, Protocol = None}}
#13 0x0000557782f95136 in main (argc=<optimized out>, argv=<optimized out>) at bgpd/bgp_main.c:532
        opt = -1
        tmp_port = <optimized out>
        bgp_port = 179
        addresses = 0x5577845012d0
        no_fib_flag = <optimized out>
        no_zebra_flag = 0
        skip_runas = 0
        instance = 0
        buffer_size = 65536
        address = <optimized out>
        node = <optimized out>
        __func__ = {<optimized out>, <optimized out>, <optimized out>, <optimized out>, <optimized out>}
        _xref = {xref = {xrefdata = 0x0, type = XREFT_ASSERT, line = 521, file = 0x5577830ad0bf "bgpd/bgp_main.c", func = 0x5577830ad5bf <__func__.16> "main"}, expr = 0x557783145aa7 "node", extra = 0x0, args = 0x0}
        xref_p_19 = 0x5577831715a0 <_xref.20>
root@R02:/opt/coredump#
ton31337 commented 3 years ago

This crash is also with BGP dampening enabled. But you said it's crashing even when it's disabled. Or I'm missing something?

EasyNetDev commented 3 years ago

This crash is also with BGP dampening enabled. But you said it's crashing even when it's disabled. Or I'm missing something?

Yes. It had bgp dampening enabled. After crash I disabled it. I'm still waiting for R01 to crash with dampening off.

EasyNetDev commented 3 years ago

I don't know if that crash it was without bgp dampening off or not, but after I posted the crash log for R01 I disabled the bgp dampening. I'm not sure if it took the command or not, but in the show run I couldn't see it. Unfortunately I didn't had time to set prlimit for the process to drop the coredump. Now is running for about 18h. I'm still waiting for it to see if is crashing or not, R02 is running for about 15h.

ton31337 commented 3 years ago

I'm trying to replicate the issue, but no joy. Maybe you have a minimal configuration to replicate this crash?

EasyNetDev commented 3 years ago

Hi @ton31337,

Unfortunately the daemon didn't crashed with dampening off. Regarding to the setup, I believe you need to get the full routing table. I can try to build a testbed FFR connected to my routers and send full routing table and I will activate dampening on this testbed and maybe I can get the same crash.

EasyNetDev commented 3 years ago

I tried to build a FRR testbed. After I activated on testbed bgp damping, R01 crashed. Also the FRR testbed crashed. Both are running latest 8.1.

ton31337 commented 3 years ago

Any chance I can get this testbed for testing?

EasyNetDev commented 3 years ago

I'm trying, but I'm facing another issue: zebra is crashing in latest 8.1-dev :(. After I'm adding this simple config on my testbed:

# cat frr.conf
frr version 8.1-dev
frr defaults traditional
hostname FRR-01
log syslog informational
service integrated-vtysh-config
!
router bgp 65590
 bgp router-id 10.180.0.40
 neighbor 10.180.0.61 remote-as 43474
 neighbor 10.180.0.61 description R01
 neighbor 10.180.0.61 graceful-restart
 !
 address-family ipv4 unicast
  neighbor 10.180.0.61 soft-reconfiguration inbound
  neighbor 10.180.0.61 route-map rm-PERMIT-ALL in
  neighbor 10.180.0.61 route-map rm-PERMIT-ALL out
 exit-address-family
!
route-map rm-PERMIT-ALL permit 1000
!
segment-routing
 traffic-eng
!
line vty
!

After few seconds zebra is crashing, but without an crash signal:

Jul 16 19:53:33 FRR-01 watchfrr[4129]: [KWE5Q-QNGFC] all daemons up, doing startup-complete notify
Jul 16 19:56:49 FRR-01 zebra[4179]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jul 16 19:56:49 FRR-01 ospfd[4191]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jul 16 19:56:49 FRR-01 ospfd[4194]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jul 16 19:56:49 FRR-01 ospf6d[4197]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jul 16 19:56:49 FRR-01 ldpd[4209]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jul 16 19:56:49 FRR-01 bgpd[4184]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jul 16 19:56:49 FRR-01 isisd[4200]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jul 16 19:56:49 FRR-01 pimd[4203]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jul 16 19:56:49 FRR-01 nhrpd[4229]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jul 16 19:56:49 FRR-01 vrrpd[4246]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jul 16 19:57:11 FRR-01 bgpd[4184]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jul 16 19:57:25 FRR-01 bgpd[4184]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jul 16 19:57:47 FRR-01 bgpd[4184]: [VTVCM-Y2NW3] Configuration Read in Took: 00:00:00
Jul 16 19:58:13 FRR-01 watchfrr[4129]: [WFP93-1D146] configuration write completed with exit code 0
Jul 16 19:58:22 FRR-01 watchfrr[4129]: [WFP93-1D146] configuration write completed with exit code 0
Jul 16 19:58:40 FRR-01 watchfrr[4129]: [HD38Q-0HBRT][EC 268435457] zebra state -> down : read returned EOF
Jul 16 19:58:40 FRR-01 bgpd[4184]: [YAF85-253AP][EC 100663299] buffer_write: write error on fd 15: Broken pipe
Jul 16 19:58:40 FRR-01 bgpd[4184]: [X6B3Y-6W42R][EC 100663302] zclient_send_message: buffer_write failed to zclient fd 15, closing
Jul 16 19:58:41 FRR-01 watchfrr[4129]: [NG1AJ-FP2TQ] Terminating on signal
Jul 16 19:58:41 FRR-01 vrrpd[4246]: [N50WA-0KKX6] Terminating on signal
Jul 16 19:58:41 FRR-01 bgpd[4184]: [ZW1GY-R46JE] Terminating on signal
Jul 16 19:58:41 FRR-01 ospfd[4194]: [W9T04-QWK6B] Terminating on signal
Jul 16 19:58:41 FRR-01 ospfd[4191]: [W9T04-QWK6B] Terminating on signal
Jul 16 19:58:41 FRR-01 pimd[4203]: [J5GFN-WGVKR] Terminating on signal SIGINT
Jul 16 19:58:41 FRR-01 ldpd[4209]: SIGINT received
Jul 16 19:58:41 FRR-01 ldpd[4209]: terminating
Jul 16 19:58:41 FRR-01 pimd[4203]: [TYPP0-VBBYM] pim_if_del_vif: vif_index=0 < 1 on interface pimreg50 ifindex=27
Jul 16 19:58:41 FRR-01 pimd[4203]: [TYPP0-VBBYM] pim_if_del_vif: vif_index=-1 < 1 on interface red ifindex=5
Jul 16 19:58:41 FRR-01 pimd[4203]: [TYPP0-VBBYM] pim_if_del_vif: vif_index=-1 < 1 on interface blue ifindex=6
Jul 16 19:58:41 FRR-01 pimd[4203]: [TYPP0-VBBYM] pim_if_del_vif: vif_index=0 < 1 on interface pimreg55 ifindex=28
Jul 16 19:58:41 FRR-01 pimd[4203]: [TYPP0-VBBYM] pim_if_del_vif: vif_index=-1 < 1 on interface green ifindex=7
Jul 16 19:58:41 FRR-01 pimd[4203]: [TYPP0-VBBYM] pim_if_del_vif: vif_index=0 < 1 on interface pimreg60 ifindex=29
Jul 16 19:58:41 FRR-01 pimd[4203]: [TYPP0-VBBYM] pim_if_del_vif: vif_index=0 < 1 on interface pimreg ifindex=26
Jul 16 19:58:41 FRR-01 isisd[4200]: [ZW9EW-V8QX8] Terminating on signal SIGINT
Jul 16 19:58:41 FRR-01 ospf6d[4197]: [SKCG8-9JAK7] Terminating on signal SIGINT
Jul 16 19:58:43 FRR-01 bgpd[4184]: [WVAM7-7ZYKQ][EC 33554499] sendmsg_nexthop: zclient_send_message() failed
Jul 16 19:58:48 FRR-01 watchfrr[4577]: [T83RR-8SM5G] watchfrr 8.1-dev starting: vty@0
Jul 16 19:58:48 FRR-01 watchfrr[4577]: [ZCJ3S-SPH5S] zebra state -> down : initial connection attempt failed
Jul 16 19:58:48 FRR-01 watchfrr[4577]: [ZCJ3S-SPH5S] bgpd state -> down : initial connection attempt failed
Jul 16 19:58:48 FRR-01 watchfrr[4577]: [ZCJ3S-SPH5S] ospfd-1 state -> down : initial connection attempt failed
Jul 16 19:58:48 FRR-01 watchfrr[4577]: [ZCJ3S-SPH5S] ospfd-2 state -> down : initial connection attempt failed
Jul 16 19:58:48 FRR-01 watchfrr[4577]: [ZCJ3S-SPH5S] ospf6d state -> down : initial connection attempt failed
Jul 16 19:58:48 FRR-01 watchfrr[4577]: [ZCJ3S-SPH5S] isisd state -> down : initial connection attempt failed
Jul 16 19:58:48 FRR-01 watchfrr[4577]: [ZCJ3S-SPH5S] ldpd state -> down : initial connection attempt failed
Jul 16 19:58:48 FRR-01 watchfrr[4577]: [ZCJ3S-SPH5S] pimd state -> down : initial connection attempt failed

This FRR testbed is receiving the full table of IPv4.

EasyNetDev commented 3 years ago

Ok. Fixed the issue. Seems that with 4GB of RAM is not enough to keep zebra running with full table :|. I've activated the bgp dampening on this testbed.

EasyNetDev commented 3 years ago

OK, I'm able to replicate the crash on my testbed. I will post the coredump today. Do you need the tcpdump?

ton31337 commented 3 years ago

No, coredump is needed only. But it would be the best if I would have an access to that box.

EasyNetDev commented 3 years ago

No, coredump is needed only. But it would be the best if I would have an access to that box.

Sure. Check your email please.

idryzhov commented 3 years ago

@EasyNetDev would be great if you could check this PR – https://github.com/FRRouting/frr/pull/9215.