Closed trmz closed 8 months ago
Can you get a backtrace from the segfault?
Hey, I'd like to, but haven't ever tried that. Would following this guide produce usable output ? https://serverfault.com/a/61676
Do I need to recompile with any particular options, to preserve symbols or improve the trace ? Don't see anything here specific to debugging ... https://github.com/Tarsnap/spiped/blob/master/BUILDING
Does make test
pass ok? Are you on amd64 or arm64?
As for the segfault, I recommend:
edit Makefile
and add -g
to line 19:
CFLAGS_DEFAULT= -O2 -g
run gdb --args spiped ....
(put in your command) I recommend adding a -F
so that you can interact with it easily if/when it dies
when gdb gives you a prompt, type run
do your normal connections
when it dies, type bt
.
Perfect, thank you.
Odd thing, in setting up a test case on a different server, this time it's the client-side (-e) that's dying consistently.
Ran both under gdb, here's what came back.
This one is Debian 11, 4.19.0-9-amd64
.
First, make test
:
tests/test_spiped.sh
Running tests
-------------
01-connection-open-close-single... SUCCESS!
02-connection-open-timeout-single... SUCCESS!
03-connection-open-close-double... SUCCESS!
04-send-data-spipe... SUCCESS!
05-send-data-spiped... SUCCESS!
06-send-data-system-spiped... SUCCESS!
07-dnsthread-resolve... SUCCESS!
08-send-data-spipe-limited-bps... SUCCESS!
09-pushbits... SUCCESS!
10-pthread-create-blocking-np... SUCCESS!
Installed an entirely clean/new Pg, zero post-install changes :
# apt-cache policy postgresql
postgresql:
Installed: 13+225
Candidate: 13+225
Server-side running as :
# gdb --args spiped -F -d -k /root/test.key -p /root/test.pid -s [10.0.0.111]:7700 -t [127.0.0.1]:5432
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
[...]
Reading symbols from spiped...
(gdb) run
Starting program: /usr/local/bin/spiped -F -d -k /root/test.key -p /root/test.pid -s \[10.0.0.111\]:7700 -t \[127.0.0.1\]:5432
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 26059)]
Client-side :
# gdb --args spiped -F -e -k /root/test.key -p /root/test.e.pid -s [127.0.0.1]:7732 -t [10.0.0.111]:7700
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
[...]
Reading symbols from spiped...
(gdb) run
Starting program: /usr/local/bin/spiped -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 26074)]
spiped: shutdown: Transport endpoint is not connected
spiped: Error running event loop
spiped: close: Bad file descriptor
spiped: close: Bad file descriptor
Thread 1 "spiped" received signal SIGSEGV, Segmentation fault.
0x000055555556584f in network_connect_cancel ()
(gdb) bt
#0 0x000055555556584f in network_connect_cancel ()
#1 0x000055555555e9f2 in proto_conn_drop ()
#2 0x0000555555558ff2 in dispatch_shutdown (dispatch_cookie=dispatch_cookie@entry=0x55555557ff90) at dispatch.c:300
#3 0x000055555555846f in main (argc=<optimized out>, argv=<optimized out>) at main.c:391
All I ran to trigger it was a few successive :
# psql -h 127.0.0.1 -p 7732
Password for user root:
Shouldn't be something that psql
is doing to trigger it ... given that v1.6.2 is entirely stable ?
Can get a packet capture too if that'd help.
Thanks, I'll try to reproduce this!
I forgot to mention that before running make
(after editing the Makefile
), it would be helpful to run make clean
. Lines #2
and #3
of the gdb bt shows exact lines of code, but #0
and #1
don't, so I suspect that those files were already compiled with the old non-g
flags.
I don't think that would change anything in terms of debugging this, but for future reference in generating backtraces, it's good to include that make clean
.
Tried make clean
between Makefile
edit and make
:
Starting program: /usr/local/bin/spiped-22cc9b4 -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 4399)]
spiped-22cc9b4: shutdown: Transport endpoint is not connected
spiped-22cc9b4: Error running event loop
spiped-22cc9b4: close: Bad file descriptor
spiped-22cc9b4: close: Bad file descriptor
Thread 1 "spiped-22cc9b4" received signal SIGSEGV, Segmentation fault.
0x000055555555cda7 in sock_addr_free ()
(gdb) bt
#0 0x000055555555cda7 in sock_addr_free ()
#1 0x000055555555d3a6 in sock_addr_freelist ()
#2 0x000055555555e9fb in proto_conn_drop ()
#3 0x0000555555558ff2 in dispatch_shutdown (dispatch_cookie=dispatch_cookie@entry=0x55555557ff90) at dispatch.c:300
#4 0x000055555555846f in main (argc=<optimized out>, argv=<optimized out>) at main.c:391
(gdb)
Out of curiosity, tried compiling previous versions - all of these seem fine (though, only tested them each for a short time each; will run them longer each) :
a27c087
17bdcaa
7b1416a
fc91a10
So far (on this test server) it has been the "client-side" / -e
that's been segfaulting.
Wanted to try inducing the -d
side to segfault instead (as seen on other servers), so ran -d
with spiped-22cc9b4
and -e
with spiped-a27c087
.
Maybe an invalid test to run mismatched versions, but for whatever it's worth, the server-side did finally segfault too :
Starting program: /usr/local/bin/spiped-22cc9b4 -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 20691)]
spiped-22cc9b4: shutdown: Transport endpoint is not connected
spiped-22cc9b4: Error running event loop
spiped-22cc9b4: close: Bad file descriptor
spiped-22cc9b4: close: Bad file descriptor
Thread 1 "spiped-22cc9b4" received signal SIGSEGV, Segmentation fault.
0x000055555555cda7 in sock_addr_free ()
(gdb) bt
#0 0x000055555555cda7 in sock_addr_free ()
#1 0x000055555555d3a6 in sock_addr_freelist ()
#2 0x000055555555e9fb in proto_conn_drop ()
#3 0x0000555555558ff2 in dispatch_shutdown (dispatch_cookie=dispatch_cookie@entry=0x55555557ff90) at dispatch.c:300
#4 0x000055555555846f in main (argc=<optimized out>, argv=<optimized out>) at main.c:391
Edit : Dangit, messed up the previous commit bisect. Will redo it proper.
Thanks, I was just about to ask for a build with the full debug output!
... but unfortunately I now see that there's a bug in the build process; $CFLAGS_DEFAULT isn't getting applied to the libraries. :(
I've uploaded a fix in the https://github.com/Tarsnap/spiped/tree/investigate-segfault branch. Could you please do:
git pull -r
git checkout investigate-segfault
make clean
make
and then try the test again?
I wouldn't worry about trying to get -d
to segfault right now. If you can reliably get -e
to segfault (and even better, identify the commit that caused it!), I'm optimistic that fixing that will also fix the -d
problem as well.
To make testing easier, you could run spiped -d
from the repository (i.e. the stable 1.6.2). As far as I know, git master
(and the investigate-segfault
branch) are compatible with 1.6.2. That way, you can leave the spiped -d
chugging along, and only change the spiped -e
binary.
Thanks, will try that branch with fuller debug.
Bisected it a bit more properly in the meantime.
88dcd94
seems stable at a glance.
92aab18
dies immediately on first request or so.
Unfortunately, backtrace seems considerably different, so might have to bisect not just for any error but the particular error above.
Oh well !
Starting program: /usr/local/bin/spiped-92aab18 -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 9371)]
spiped-92aab18: shutdown: Transport endpoint is not connected
spiped-92aab18: Error running event loop
double free or corruption (out)
Thread 1 "spiped-92aab18" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1 0x00007ffff7afc537 in __GI_abort () at abort.c:79
#2 0x00007ffff7b543a8 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7c72390 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3 0x00007ffff7b5b69a in malloc_printerr (str=str@entry=0x7ffff7c74750 "double free or corruption (out)") at malloc.c:5347
#4 0x00007ffff7b5ccc8 in _int_free (av=0x7ffff7ca8b80 <main_arena>, p=0x55555557ffa0, have_lock=<optimized out>) at malloc.c:4314
#5 0x000055555555cbe5 in sock_addr_freelist ()
#6 0x000055555555de96 in proto_conn_drop ()
#7 0x0000555555558dfa in dispatch_shutdown (dispatch_cookie=dispatch_cookie@entry=0x55555557ff40) at dispatch.c:296
#8 0x0000555555558713 in main (argc=<optimized out>, argv=<optimized out>) at main.c:377
Starting program: /usr/local/bin/spiped-48b0d03 -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 13520)]
spiped-48b0d03: shutdown: Transport endpoint is not connected
spiped-48b0d03: Error running event loop
spiped-48b0d03: close: Bad file descriptor
spiped-48b0d03: close: Bad file descriptor
Thread 1 "spiped-48b0d03" received signal SIGSEGV, Segmentation fault.
sock_addr_freelist (sas=<optimized out>) at ../libcperciva/util/sock.c:581
581 sock_addr_free(*p);
(gdb) bt
#0 sock_addr_freelist (sas=<optimized out>) at ../libcperciva/util/sock.c:581
#1 sock_addr_freelist (sas=0x555555581008) at ../libcperciva/util/sock.c:571
#2 0x000055555555e6bf in proto_conn_drop (conn_cookie=0x555555581530, reason=reason@entry=1) at ../lib/proto/proto_conn.c:163
#3 0x0000555555558fe2 in dispatch_shutdown (dispatch_cookie=dispatch_cookie@entry=0x555555580f90) at dispatch.c:300
#4 0x000055555555845f in main (argc=<optimized out>, argv=<optimized out>) at main.c:391
(gdb)
Wow, thanks! I was not expecting to see a problem there!
I just uploaded a possible fix d01f5ed227; please try a git pull -r
and test.
No luck, sorry :
Reading symbols from spiped-d01f5ed...
Starting program: /usr/local/bin/spiped-d01f5ed -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 17186)]
spiped-d01f5ed: shutdown: Transport endpoint is not connected
spiped-d01f5ed: Error running event loop
spiped-d01f5ed: close: Bad file descriptor
spiped-d01f5ed: close: Bad file descriptor
Thread 1 "spiped-d01f5ed" received signal SIGSEGV, Segmentation fault.
0x0000555555565c89 in network_connect_cancel (cookie=0x5) at ../libcperciva/network/network_connect.c:286
286 assert((C->cookie_immediate != NULL) || (C->s != -1));
(gdb) bt
#0 0x0000555555565c89 in network_connect_cancel (cookie=0x5) at ../libcperciva/network/network_connect.c:286
#1 0x000055555555e6b6 in proto_conn_drop (conn_cookie=0x5555555815f0, reason=reason@entry=1) at ../lib/proto/proto_conn.c:160
#2 0x0000555555558fe2 in dispatch_shutdown (dispatch_cookie=dispatch_cookie@entry=0x555555580f90) at dispatch.c:300
#3 0x000055555555845f in main (argc=<optimized out>, argv=<optimized out>) at main.c:391
Were you able to reproduce it locally ? I just setup a VPS with the issue, if you want to send me your SSH pubkey.
Thanks, I'm optimistic that the latest backtrace got me on the right track -- I think I made an embarrassing mistake in df23813c6fab412774fb0055ecba24a634f51e94. If you would like to test it, please do a git pull
to get eee0be56d472a3f1b21dc3f835b8d8f59e89a715 on the investigate-segfault
branch.
I haven't been able to reproduce it yet, but if that doesn't fix it, I'd love to ssh into your VPS and work on it. I couldn't find your email on your github profile... could you please send me an email (just "hi" is fine) so that I can reply with my pubkey? gperciva@tarsnap.com
(I'm off work today for a volunteering task, but I'll be able to devote tomorrow to this.)
No bueno on eee0be5 - psql
connects okay once, then -e
seems to exit immediately, with no backtrace available (unless one-off was intended for this troubleshooting, or I missed a step).
Reading symbols from spiped-eee0be5...
Starting program: /usr/local/bin/spiped-eee0be5 -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad2700 (LWP 33138)]
spiped-eee0be5: shutdown: Transport endpoint is not connected
spiped-eee0be5: Error running event loop
[Thread 0x7ffff7ad2700 (LWP 33138) exited]
[Inferior 1 (process 33134) exited with code 01]
(gdb) bt
No stack.
Will email !
It turns out that eee0be5 does fix a problem; but not this problem. (It's part of PR #398 now.)
I'm still investigating.
I updated #398 to fix this issue (we should ignore an ENOTCONN
, not bail); it's merged now.
Compiled latest (22cc9b4c18) on a couple different versions of Debian. Segfaults pretty consistently, after the first one or two or few tunneled connections (to PostgreSQL, for the curious). (Here, I'm aware -b 127.0.0.1 is unnecessary, but it segfaults all the same regardless of the chosen address.)
Command:
spiped -d -k /root/test.key -p /root/test.pid -s [192.168.1.2]:7700 -t [127.0.0.1]:5432 -b 127.0.0.1 -n 40
Error:
Syslog:
Here's a segfault from a different server, Debian 10 :
EDIT: Nevermind, I lied. Thought it worked fine without option -b ... turns out it was just luck, segfaults all the same with or without -b. :/ Feel free to rename this issue, or I can create a new one. Note that spiped v1.6.2 has been working stably (but lacks the -b option I need).