Closed sbr2004 closed 7 months ago
Hi @sbr2004, thanks for reporting the issue. On which OS is this hapenning? Also, could you please share the arguments that are being set on each execution?
uname -a Linux fort 4.9.0-12-amd64 #1 SMP Debian 4.9.210-1 (2020-01-20) x86_64 GNU/Linux
/usr/local/bin/fort --tal /var/tal --local-repository /var/fort_cache --server.address X.X.X.X --server.port 323 --log.output=syslog --log.level=info --mode=server
Thanks for the data, we'll be working on this.
This is just by curiosity, but it has come to my attention the difference between both versions, have you tested also the rest of the versions in between 1.2.1 and 1.4.2 (ie. v1.3.0, v1.4.0, and v1.4.1)? This question is mainly to know where we can focus the efforts.difference
No, I did not test other versions. Which of them in between shall I try?
Thanks for clarify that. Well, none in particular; my question was merely to discard that the issue isn't present at those versions.
Hi again @sbr2004 , we've reviewing this and "luckily" we could replicate the issue a couple of times.
Currently we have a hypothesis related to the stable version of libcurl
and libssl
at Debian 9 (the specific dependencies that we recommend to install are libcurl4-openssl-dev
and libssl-dev
).
FORT validator depends on an OpenSSL version greater than 1.0 (so far this isn't a problem at Debian 9 since it has support for libssl-dev 1.1
), and in the case of libcurl
an acceptable updated version should be enough. In this particular case libcurl4-openssl-dev
depends on libcurl3
which depends on libssl-dev
< 1.1.
Probably this could help a bit to see what's described in the previous paragraph Debian - Package libcurl4-openssl-dev
Also, we have this warning at Debian 9 after compiling (make):
/usr/bin/ld: warning: libcrypto.so.1.0.2, needed by /usr/lib/gcc/x86_64-linux-gnu/6/../../../x86_64-linux-gnu/libcurl.so, may conflict with libcrypto.so.1.1
Now, here comes the part where we're trying to relate all of this. There's a documented behavior related to the use of OpenSSL 1.1 and 1.0 at the same time: https://wiki.debian.org/OpenSSL-1.1
So, we're using libssl-dev 1.1
and a libcurl-dev
that's linked to libssl-dev 1.0
, we can't discard yet that there's no problem with this. Since yesterday we've running a couple of instances using an updated version of libcurl4-openssl-dev
, and I would like to ask for your help to do the same procedure to verify if this solves the issue:
#Add the following line to `/etc/apt/sources.list`:
deb http://deb.debian.org/debian testing main
#Update:
sudo apt-get update
#Install libcurl version from the testing repo:
sudo apt-get -t testing install libcurl4-openssl-dev
#Recompile FORT validator
./configure
make
sudo make install
_NOTE: The warning /usr/bin/ld: warning: libcrypto.so.1.0.2, needed by /usr/lib/gcc/x86_64-linux-gnu/6/../../../x86_64-linux-gnu/libcurl.so, may conflict with libcrypto.so.1.1
shouldn't show up again._
Basically we're installing the libcurl4-openssl-dev
dependency from the Debian buster repository, which is more udpated that the version at Debian stretch.
So far, our instances at Debian 9 are still alive and we'll leave them running also to verify if this "recipe" solves the problem. Please let me know if you can help us with this.
Hi,
Thank you for answer! My systems are Devuan Ascii (debian w/o systemd). I'll try to figure out how compile as you suggested on Devuan.
Unfortunately this solution did not help. fort 1.4.2, compiled as you advised, crashes from time to time.
Ok. I'm working on this now.
Do you have the 1.2.1 stack trace? Has the stack trace been consistent in 1.4.2?
Hi,
No, I dont have trace for 1.2.1 and cant answer if traces were consistent. Will watch on them now.
Can you run the new commit?
Assuming we get the same stack trace, it should reveal more information. (Some function names are obscured in the original one.)
Complied and launhced on one of servers, will keep an eye on it.
Thank you.
/usr/local/bin/fort(print_stack_trace+0x1f) [0x5564913dabcf] /usr/local/bin/fort(pr_enomem+0x18) [0x5564913dd918] /usr/local/bin/fort(+0x24c89) [0x5564913ddc89] /usr/local/bin/fort(+0x3d596) [0x5564913f6596] /usr/local/bin/fort(rtrhandler_handle_roa_v4+0x3c) [0x5564913f729c] /usr/local/bin/fort(handle_roa_v4+0x32) [0x5564913f98f2] /usr/local/bin/fort(vhandler_handle_roa_v4+0x39) [0x5564913e2779] /usr/local/bin/fort(roa_traverse+0x4c4) [0x5564913ec914] /usr/local/bin/fort(rpp_traverse+0x38) [0x5564913e0e88] /usr/local/bin/fort(certificate_traverse+0xc65) [0x5564913eb375] /usr/local/bin/fort(+0x340fb) [0x5564913ed0fb] /usr/local/bin/fort(+0x34b01) [0x5564913edb01] /usr/local/bin/fort(+0x438c4) [0x5564913fc8c4] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7fcc9b487fa3] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7fcc9b3b84cf] (Stack size was 15.)
Been reviewing all day, but I'm still empty-handed.
I noticed that this last stack trace doesn't include the Segmentation Fault. Stack trace:
header. Did it really crash?
Also, it seems to point to a memory leak this time. How much RAM does this server have?
Yes, it crashed (I've got notification from monitoring). It is a VM with 8G RAM.
Tasks: 114 total, 1 running, 113 sleeping, 0 stopped, 0 zombie %Cpu0 : 0.0 us, 0.0 sy, 0.0 ni,100.0 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st %Cpu1 : 0.3 us, 0.0 sy, 0.0 ni, 99.7 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st MiB Mem : 7978.7 total, 5250.6 free, 225.2 used, 2502.9 buff/cache MiB Swap: 2109.0 total, 2109.0 free, 0.0 used. 7464.5 avail Mem
Some more questions:
Yes, servers are handling RTR requests from routers (total 3 servers). No new stack traces, servers are running fine at the moment.
2 fort servers crashed today within 2 minutes. Restarted them and they crashed again:
server #1: fort 1.5.0
Stack trace: /usr/local/bin/fort(print_stack_trace+0x1f) [0x56476fb0fbcf] /usr/local/bin/fort(pr_crit+0x89) [0x56476fb12d19] /usr/local/bin/fort(+0x1e433) [0x56476fb0c433] /usr/local/bin/fort(deferstack_pop+0x2f) [0x56476fb0c64f] /usr/local/bin/fort(+0x3411a) [0x56476fb2211a] /usr/local/bin/fort(+0x34b01) [0x56476fb22b01] /usr/local/bin/fort(+0x438c4) [0x56476fb318c4] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7ff8a7de6fa3] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7ff8a7d174cf] (Stack size was 9.)
server #2: fort 1.2.1
Stack trace: /usr/local/bin/fort(print_stack_trace+0x1a) [0x415c6a] /usr/local/bin/fort(pr_crit+0x7f) [0x417b7f] /usr/local/bin/fort() [0x412c25] /usr/local/bin/fort(deferstack_pop+0x3b) [0x412e0b] /usr/local/bin/fort() [0x423b70] /lib64/libpthread.so.0(+0x7ea5) [0x7f080dca1ea5] /lib64/libc.so.6(clone+0x6d) [0x7f080d9ca8dd] (Stack size was 7.)
It's unrelated. See #58 and #59
Today:
1.5.0
Stack trace: /usr/local/bin/fort(print_stack_trace+0x1f) [0x559142d8ebcf] /usr/local/bin/fort(pr_enomem+0x18) [0x559142d91918] /usr/local/bin/fort(+0x24c89) [0x559142d91c89] /usr/local/bin/fort(+0x3d596) [0x559142daa596] /usr/local/bin/fort(rtrhandler_handle_roa_v4+0x3c) [0x559142dab29c] /usr/local/bin/fort(handle_roa_v4+0x32) [0x559142dad8f2] /usr/local/bin/fort(vhandler_handle_roa_v4+0x39) [0x559142d96779] /usr/local/bin/fort(roa_traverse+0x4c4) [0x559142da0914] /usr/local/bin/fort(rpp_traverse+0x38) [0x559142d94e88] /usr/local/bin/fort(certificate_traverse+0xc65) [0x559142d9f375] /usr/local/bin/fort(+0x340fb) [0x559142da10fb] /usr/local/bin/fort(+0x34b01) [0x559142da1b01] /usr/local/bin/fort(+0x438c4) [0x559142db08c4] /lib/x86_64-linux-gnu/libpthread.so.0(+0x7fa3) [0x7f607cd81fa3] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f607ccb24cf] (Stack size was 15.)
It really is just an out of memory error. But whether there's a memory leak...
Ok, before I throw myself to a bunch of lengthy tests, I need to leave this out there:
Can you please upgrade to the latest master? As in, not even to version 1.5.1. Please upgrade to the absolute latest commit.
There have been several critical bugfixes since 1.5.0, to the point I wouldn't even consider it stable anymore.
As stated in the 1.6.0 release notes, I have found and patched several instances of undefined behavior during the reviews. There is no way to prove that these caused this particular crash (particularly considering that I never managed to reproduce it, and the OP already probably left), but the code has changed so much, at this point I expect the bug to manifest in a completely different way, if at all.
If you're still there, please upgrade to the latest version. If it crashes again, please open a new bug.
Segmentation Fault. Stack trace: /usr/local/bin/fort(print_stack_trace+0x23) [0x55d196279de3] /usr/local/bin/fort(+0x1ee96) [0x55d196279e96] /lib/x86_64-linux-gnu/libpthread.so.0(+0x110e0) [0x7f46201f20e0] /usr/local/bin/fort(+0x34e99) [0x55d19628fe99] /usr/local/bin/fort(rtrhandler_handle_roa_v4+0x52) [0x55d196290fa2] /usr/local/bin/fort(handle_roa_v4+0x32) [0x55d1962936f2] /usr/local/bin/fort(vhandler_handle_roa_v4+0x39) [0x55d19627f2d9] /usr/local/bin/fort(roa_traverse+0x64f) [0x55d1962876ef] /usr/local/bin/fort(rpp_traverse+0x38) [0x55d19627daf8] /usr/local/bin/fort(certificate_traverse+0x9ac) [0x55d196285fec] /usr/local/bin/fort(+0x2d5e3) [0x55d1962885e3] /lib/x86_64-linux-gnu/libpthread.so.0(+0x74a4) [0x7f46201e84a4] /lib/x86_64-linux-gnu/libc.so.6(clone+0x3f) [0x7f461ff2ad0f] (Stack size was 13.)