Closed Napsterbater closed 3 years ago
Are you able to gather coredump?
If you can get me info on how I would be glad to.
Is this what you need?
From /var/tmp/frr https://drive.google.com/file/d/1J2qDwBmb6KUyyUztTRv17IuMTyU09OOn/view?usp=sharing
NVM, this is what you wanted, i think.
I have 2 in a 7z file. .crash1 is the first time this happened and it generated the dump that I didn't know about. .crash2 is one I just made. Same peer, same commands, practically identical test, only on a different day.
https://drive.google.com/file/d/1Frafmbw_iyWk_30fGhge-MlgWMVcsx5I/view?usp=sharing
Could you tell me how did you grab those core-dumps? Seems not as I was expecting.
Got the from /var/crash
How should a generate a proper core dump?
is not a core dump: File format not recognized
.
Use these sysctl settings:
sysctl -w kernel.core_pattern=/var/crash/core.%u.%e.%p
sysctl -w kernel.suid_dumpable=2
So apparently those were Apport crash files. And encoded in them was the dump file.
I have unpacked them (apport-unpack _usr_lib_frr_bgpd.0.crash1 crash1/), and packed them into the .7z, there is 3 folders, in those 3 folders is the core dumps from 3 different crashes related to this, plus other misc info about the process.
All three used the same peer/commands, only difference is the day they were run/dumped.
https://drive.google.com/file/d/16AcgZWgZQxSaSROOwU7FMChmysakZ1hg/view?usp=sharing
Can you install gdb
in your machine and run gdb -ex 'bt' --batch /usr/lib/frr/bgpd crash1/CoreDump
? The output from there will be helpful unless there is something like I get:
% gdb -ex 'bt' --batch /usr/lib/frr/bgpd CoreDump
[New LWP 1672]
[New LWP 1673]
[New LWP 1675]
[New LWP 1674]
[New LWP 1705]
Core was generated by `/usr/lib/frr/bgpd -d -F traditional -A 127.0.0.1 -M rpki'.
Program terminated with signal SIGABRT, Aborted.
#0 0x00007f931a55718b in ?? ()
[Current thread is 1 (LWP 1672)]
#0 0x00007f931a55718b in ?? ()
#1 0xfffffffe7ffbdedf in ?? ()
#2 0x00007f931a6fe1a0 in ?? ()
#3 0x0000000000000000 in ?? ()
Crash 1
napsterbater@ATL1-US:/var/crash$ gdb -ex 'bt' --batch /usr/lib/frr/bgpd /var/crash/crash1/CoreDump
[New LWP 1672]
[New LWP 1673]
[New LWP 1675]
[New LWP 1674]
[New LWP 1705]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/bgpd -d -F traditional -A 127.0.0.1 -M rpki'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f931a10c7c0 (LWP 1672))]
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f931a536859 in __GI_abort () at abort.c:79
#2 0x00007f931a9162f4 in ?? () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#3 <signal handler called>
#4 0x000055c0e0f83802 in ?? ()
#5 0x000055c0e0f85b6a in bgp_zebra_announce ()
#6 0x000055c0e0f395a2 in ?? ()
#7 0x000055c0e0f396ee in ?? ()
#8 0x00007f931a92fd08 in work_queue_run () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#9 0x00007f931a925c1a in thread_call () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#10 0x00007f931a8ed298 in frr_run () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#11 0x000055c0e0ee8d89 in main ()
Crash 2
napsterbater@ATL1-US:/var/crash$ gdb -ex 'bt' --batch /usr/lib/frr/bgpd /var/crash/crash2/CoreDump
[New LWP 54497]
[New LWP 54498]
[New LWP 54500]
[New LWP 54499]
[New LWP 54505]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/bgpd -d -F traditional -A 127.0.0.1 -M rpki'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f8c56e3e7c0 (LWP 54497))]
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f8c57268859 in __GI_abort () at abort.c:79
#2 0x00007f8c576482f4 in ?? () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#3 <signal handler called>
#4 0x000055c2b195d802 in ?? ()
#5 0x000055c2b195fb6a in bgp_zebra_announce ()
#6 0x000055c2b19135a2 in ?? ()
#7 0x000055c2b19136ee in ?? ()
#8 0x00007f8c57661d08 in work_queue_run () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#9 0x00007f8c57657c1a in thread_call () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#10 0x00007f8c5761f298 in frr_run () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#11 0x000055c2b18c2d89 in main ()
Crash 3
napsterbater@ATL1-US:/var/crash$ gdb -ex 'bt' --batch /usr/lib/frr/bgpd /var/crash/crash3/CoreDump
[New LWP 54822]
[New LWP 54825]
[New LWP 54830]
[New LWP 54824]
[New LWP 54823]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/bgpd -d -F traditional -A 127.0.0.1 -M rpki'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f8c781717c0 (LWP 54822))]
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f8c7859b859 in __GI_abort () at abort.c:79
#2 0x00007f8c7897b2f4 in ?? () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#3 <signal handler called>
#4 0x000055c232759802 in ?? ()
#5 0x000055c23275bb6a in bgp_zebra_announce ()
#6 0x000055c23270f5a2 in ?? ()
#7 0x000055c23270f6ee in ?? ()
#8 0x00007f8c78994d08 in work_queue_run () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#9 0x00007f8c7898ac1a in thread_call () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#10 0x00007f8c78952298 in frr_run () from /usr/lib/x86_64-linux-gnu/frr/libfrr.so.0
#11 0x000055c2326bed89 in main ()
Please install frr-dbgsym
and frr-rpki-rtrlib-dbgsym
packages and try again with core dump.
Please install
frr-dbgsym
andfrr-rpki-rtrlib-dbgsym
packages and try again with core dump.
Seems those are not available/built(?) for ubuntu?
napsterbater@ATL1-US:~$ sudo apt search frr
Sorting... Done
Full Text Search... Done
frr/unknown,now 7.5-0~ubuntu20.04 amd64 [installed]
FRRouting suite of internet protocols (BGP, OSPF, IS-IS, ...)
frr-doc/unknown,now 7.5-0~ubuntu20.04 all [installed]
FRRouting suite - user manual
frr-pythontools/unknown,now 7.5-0~ubuntu20.04 all [installed]
FRRouting suite - Python tools
frr-rpki-rtrlib/unknown,now 7.5-0~ubuntu20.04 amd64 [installed]
FRRouting suite - BGP RPKI support (rtrlib)
frr-snmp/unknown,now 7.5-0~ubuntu20.04 amd64 [installed]
FRRouting suite - SNMP support
napsterbater@ATL1-US:~$ sudo apt search frr-dbgsym
Sorting... Done
Full Text Search... Done
napsterbater@ATL1-US:~$ sudo apt install frr-rpki-rtrlib-dbgsym frr-dbgsym
Reading package lists... Done
Building dependency tree
Reading state information... Done
E: Unable to locate package frr-rpki-rtrlib-dbgsym
E: Unable to locate package frr-dbgsym
Looking here, I only see them for Deb 9/10 https://deb.frrouting.org/frr/pool/frr-stable/f/frr/
@Napsterbater you are now able to install those packages, already added into the repo.
sudo apt install frr-rpki-rtrlib-dbgsym frr-dbgsym
Was successful.
napsterbater@ATL1-US:/var/crash$ gdb -ex 'bt' --batch /usr/lib/frr/bgpd /var/crash/crash1/CoreDump
[New LWP 1672]
[New LWP 1673]
[New LWP 1675]
[New LWP 1674]
[New LWP 1705]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/bgpd -d -F traditional -A 127.0.0.1 -M rpki'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f931a10c7c0 (LWP 1672))]
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f931a536859 in __GI_abort () at abort.c:79
#2 0x00007f931a9162f4 in core_handler (signo=11, siginfo=0x7ffc19b986f0, context=<optimized out>) at lib/sigevent.c:228
#3 <signal handler called>
#4 0x000055c0e0f83802 in bgp_path_info_to_ipv6_nexthop (ifindex=ifindex@entry=0x7ffc19b98bdc, path=<optimized out>, path=<optimized out>) at bgpd/bgp_zebra.c:912
#5 0x000055c0e0f85b6a in bgp_zebra_announce (dest=dest@entry=0x55c0e25b27e0, p=p@entry=0x55c0e25b27e0, info=info@entry=0x55c0e32334f0, bgp=bgp@entry=0x55c0e1f7d620, afi=afi@entry=AFI_IP6, safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_zebra.c:1387
#6 0x000055c0e0f395a2 in bgp_process_main_one (safi=SAFI_UNICAST, afi=AFI_IP6, dest=0x55c0e25b27e0, bgp=0x55c0e1f7d620) at bgpd/bgp_route.c:2820
#7 bgp_process_main_one (bgp=0x55c0e1f7d620, dest=0x55c0e25b27e0, afi=AFI_IP6, safi=SAFI_UNICAST) at bgpd/bgp_route.c:2591
#8 0x000055c0e0f396ee in bgp_process_wq (wq=<optimized out>, data=0x55c0e29aa410) at bgpd/bgp_route.c:2926
#9 0x00007f931a92fd08 in work_queue_run (thread=0x7ffc19ba9210) at lib/workqueue.c:291
#10 0x00007f931a925c1a in thread_call (thread=thread@entry=0x7ffc19ba9210) at lib/thread.c:1581
#11 0x00007f931a8ed298 in frr_run (master=0x55c0e19ff7a0) at lib/libfrr.c:1099
#12 0x000055c0e0ee8d89 in main (argc=8, argv=0x7ffc19ba9598) at bgpd/bgp_main.c:513
napsterbater@ATL1-US:/var/crash$ gdb -ex 'bt' --batch /usr/lib/frr/bgpd /var/crash/crash2/CoreDump
[New LWP 54497]
[New LWP 54498]
[New LWP 54500]
[New LWP 54499]
[New LWP 54505]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/bgpd -d -F traditional -A 127.0.0.1 -M rpki'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f8c56e3e7c0 (LWP 54497))]
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f8c57268859 in __GI_abort () at abort.c:79
#2 0x00007f8c576482f4 in core_handler (signo=11, siginfo=0x7ffcf9f7e1f0, context=<optimized out>) at lib/sigevent.c:228
#3 <signal handler called>
#4 0x000055c2b195d802 in bgp_path_info_to_ipv6_nexthop (ifindex=ifindex@entry=0x7ffcf9f7e6ac, path=<optimized out>, path=<optimized out>) at bgpd/bgp_zebra.c:912
#5 0x000055c2b195fb6a in bgp_zebra_announce (dest=dest@entry=0x55c2b4bbef20, p=p@entry=0x55c2b4bbef20, info=info@entry=0x55c2b5ba5bb0, bgp=bgp@entry=0x55c2b3e5c090, afi=afi@entry=AFI_IP6, safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_zebra.c:1387
#6 0x000055c2b19135a2 in bgp_process_main_one (safi=SAFI_UNICAST, afi=AFI_IP6, dest=0x55c2b4bbef20, bgp=0x55c2b3e5c090) at bgpd/bgp_route.c:2820
#7 bgp_process_main_one (bgp=0x55c2b3e5c090, dest=0x55c2b4bbef20, afi=AFI_IP6, safi=SAFI_UNICAST) at bgpd/bgp_route.c:2591
#8 0x000055c2b19136ee in bgp_process_wq (wq=<optimized out>, data=0x55c2b5ba5a10) at bgpd/bgp_route.c:2926
#9 0x00007f8c57661d08 in work_queue_run (thread=0x7ffcf9f8ece0) at lib/workqueue.c:291
#10 0x00007f8c57657c1a in thread_call (thread=thread@entry=0x7ffcf9f8ece0) at lib/thread.c:1581
#11 0x00007f8c5761f298 in frr_run (master=0x55c2b38db7a0) at lib/libfrr.c:1099
#12 0x000055c2b18c2d89 in main (argc=8, argv=0x7ffcf9f8f068) at bgpd/bgp_main.c:513
napsterbater@ATL1-US:/var/crash$ gdb -ex 'bt' --batch /usr/lib/frr/bgpd /var/crash/crash3/CoreDump
[New LWP 54822]
[New LWP 54825]
[New LWP 54830]
[New LWP 54824]
[New LWP 54823]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/lib/frr/bgpd -d -F traditional -A 127.0.0.1 -M rpki'.
Program terminated with signal SIGABRT, Aborted.
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
[Current thread is 1 (Thread 0x7f8c781717c0 (LWP 54822))]
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007f8c7859b859 in __GI_abort () at abort.c:79
#2 0x00007f8c7897b2f4 in core_handler (signo=11, siginfo=0x7ffdb18ff9b0, context=<optimized out>) at lib/sigevent.c:228
#3 <signal handler called>
#4 0x000055c232759802 in bgp_path_info_to_ipv6_nexthop (ifindex=ifindex@entry=0x7ffdb18ffe7c, path=<optimized out>, path=<optimized out>) at bgpd/bgp_zebra.c:912
#5 0x000055c23275bb6a in bgp_zebra_announce (dest=dest@entry=0x55c233c12700, p=p@entry=0x55c233c12700, info=info@entry=0x55c233ac9d50, bgp=bgp@entry=0x55c23361e090, afi=afi@entry=AFI_IP6, safi=safi@entry=SAFI_UNICAST) at bgpd/bgp_zebra.c:1387
#6 0x000055c23270f5a2 in bgp_process_main_one (safi=SAFI_UNICAST, afi=AFI_IP6, dest=0x55c233c12700, bgp=0x55c23361e090) at bgpd/bgp_route.c:2820
#7 bgp_process_main_one (bgp=0x55c23361e090, dest=0x55c233c12700, afi=AFI_IP6, safi=SAFI_UNICAST) at bgpd/bgp_route.c:2591
#8 0x000055c23270f6ee in bgp_process_wq (wq=<optimized out>, data=0x55c234af8240) at bgpd/bgp_route.c:2926
#9 0x00007f8c78994d08 in work_queue_run (thread=0x7ffdb19104b0) at lib/workqueue.c:291
#10 0x00007f8c7898ac1a in thread_call (thread=thread@entry=0x7ffdb19104b0) at lib/thread.c:1581
#11 0x00007f8c78952298 in frr_run (master=0x55c23309d7a0) at lib/libfrr.c:1099
#12 0x000055c2326bed89 in main (argc=8, argv=0x7ffdb1910838) at bgpd/bgp_main.c:513
@Napsterbater that sounds much better. One last thing to ask you :) gdb -ex 'bt full' --batch /usr/lib/frr/bgpd /var/crash/crash3/CoreDump
.
Here is the output for all three.
OK, this sounds like a prefix with the wrong attributes is handled. I need more context here. It looks like it's originated locally and then crashes when handling next-hop. That could be due to import vrf
maybe. I pinged you in Slack, please provide me the details I asked you.
I managed to replicate this. This is as my first thought was (due to import vrf).
Just copying some info from the slack convo.
Ok, so disabling all "import vrf" and bringing the affected peer up does not crash FRR. Reenabling the import vrf statements with that one peer up kills FRR. So I agree definitely related to VRFs/importing VRFs
FRR still crashes with 172.20.16.139 shutdown
debug.log show bgp vrf dn42 summary.txt show bgp ipv6 unicast.txt
Fixed in master, backported to 7.5, but not yet released as .deb. I believe soon you will have a new release.
@polychaeta autoclose in 2 days.
Just wanted to confirm, this is indeed fixed for me on 7.5.1. Thanks!
Describe the bug
Upon adding/activating a particular IPv6 Multiprotocol Link-Local peer FRR completely crashes and restarts with the following in the logs (set to debug.
Note FRR already running normal with many peers connected when I then set the logging to debug and configured the peer,, this starts at the very first message generated, wichch are the basic and "normal" "unrecognized capability code".
[x] Did you check if this is a duplicate issue? Did find one, unless I just overlooked it. [ ] Did you test it on the latest FRRouting/frr master branch? Latest via APT/Repo
To Reproduce
Expected behavior
Not to crash?
Versions
Additional context
PCAP of the exchange between the peers if it helps: https://drive.google.com/file/d/1rd6_W4qX2RC3ZQ5WjVVwyLHziRpQA2Fd/view?usp=sharing
Full Config: