Tarsnap / spiped

Spiped is a utility for creating symmetrically encrypted and authenticated pipes between socket addresses.
http://www.tarsnap.com/spiped.html
Other
858 stars 56 forks source link

Segfault: Error running event loop, Bad file descriptor #397

Closed trmz closed 8 months ago

trmz commented 9 months ago

Compiled latest (22cc9b4c18) on a couple different versions of Debian. Segfaults pretty consistently, after the first one or two or few tunneled connections (to PostgreSQL, for the curious). (Here, I'm aware -b 127.0.0.1 is unnecessary, but it segfaults all the same regardless of the chosen address.)

Command: spiped -d -k /root/test.key -p /root/test.pid -s [192.168.1.2]:7700 -t [127.0.0.1]:5432 -b 127.0.0.1 -n 40

Error:

spiped: shutdown: Transport endpoint is not connected
spiped: Error running event loop
spiped: close: Bad file descriptor

Syslog:

Jan 30 21:04:21 host kernel: [11486873.888877] spiped[7300]: segfault at 29 ip 0000000000407805 sp 00007ffeefb62fb0 error 4 in spiped-22cc9b4c18[400000+17000]
Jan 30 21:04:21 host kernel: [11486873.889835] audit: type=1701 audit(1706666661.707:67894): auid=1011 uid=0 gid=0 ses=27243 pid=7300 comm="spiped" exe="/usr/local/bin/spiped-22cc9b4c18" sig=11

Here's a segfault from a different server, Debian 10 :

[  +0.036077] spiped[12592]: segfault at a9 ip 0000559136791e2c sp 00007fff2bfef820 error 4 in spiped-22cc9b4c18[55913678c000+11000]
[  +0.009377] Code: 58 fe ff ff bb ff ff ff ff e9 4e fe ff ff 48 83 ec 08 be 00 00 00 00 e8 71 fd ff ff 48 83 c4 08 c3 48 85 ff 74 17 53 48 89 fb <48> 8b 7f 08 e8 7b a4 ff ff 48 89 df e8 73 a4 ff ff 5b c3 c3 41 57

EDIT: Nevermind, I lied. Thought it worked fine without option -b ... turns out it was just luck, segfaults all the same with or without -b. :/ Feel free to rename this issue, or I can create a new one. Note that spiped v1.6.2 has been working stably (but lacks the -b option I need).

cperciva commented 9 months ago

Can you get a backtrace from the segfault?

trmz commented 9 months ago

Hey, I'd like to, but haven't ever tried that. Would following this guide produce usable output ? https://serverfault.com/a/61676

Do I need to recompile with any particular options, to preserve symbols or improve the trace ? Don't see anything here specific to debugging ... https://github.com/Tarsnap/spiped/blob/master/BUILDING

gperciva commented 9 months ago

Does make test pass ok? Are you on amd64 or arm64?

As for the segfault, I recommend:

trmz commented 9 months ago

Perfect, thank you.

Odd thing, in setting up a test case on a different server, this time it's the client-side (-e) that's dying consistently. Ran both under gdb, here's what came back. This one is Debian 11, 4.19.0-9-amd64.

First, make test :

tests/test_spiped.sh
Running tests
-------------
  01-connection-open-close-single... SUCCESS!
  02-connection-open-timeout-single... SUCCESS!
  03-connection-open-close-double... SUCCESS!
  04-send-data-spipe... SUCCESS!
  05-send-data-spiped... SUCCESS!
  06-send-data-system-spiped... SUCCESS!
  07-dnsthread-resolve... SUCCESS!
  08-send-data-spipe-limited-bps... SUCCESS!
  09-pushbits... SUCCESS!
  10-pthread-create-blocking-np... SUCCESS!

Installed an entirely clean/new Pg, zero post-install changes :

# apt-cache policy postgresql
postgresql:
  Installed: 13+225
  Candidate: 13+225

Server-side running as :

# gdb --args spiped -F -d -k /root/test.key -p /root/test.pid -s [10.0.0.111]:7700 -t [127.0.0.1]:5432
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
[...]
Reading symbols from spiped...
(gdb) run
Starting program: /usr/local/bin/spiped -F -d -k /root/test.key -p /root/test.pid -s \[10.0.0.111\]:7700 -t \[127.0.0.1\]:5432
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 26059)]

Client-side :

# gdb --args spiped -F -e -k /root/test.key -p /root/test.e.pid -s [127.0.0.1]:7732 -t [10.0.0.111]:7700
GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git
[...]
Reading symbols from spiped...
(gdb) run
Starting program: /usr/local/bin/spiped -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 26074)]
spiped: shutdown: Transport endpoint is not connected
spiped: Error running event loop
spiped: close: Bad file descriptor
spiped: close: Bad file descriptor

Thread 1 "spiped" received signal SIGSEGV, Segmentation fault.
0x000055555556584f in network_connect_cancel ()
(gdb) bt
#0  0x000055555556584f in network_connect_cancel ()
#1  0x000055555555e9f2 in proto_conn_drop ()
#2  0x0000555555558ff2 in dispatch_shutdown (dispatch_cookie=dispatch_cookie@entry=0x55555557ff90) at dispatch.c:300
#3  0x000055555555846f in main (argc=<optimized out>, argv=<optimized out>) at main.c:391

All I ran to trigger it was a few successive :

# psql -h 127.0.0.1 -p 7732
Password for user root:

Shouldn't be something that psql is doing to trigger it ... given that v1.6.2 is entirely stable ? Can get a packet capture too if that'd help.

gperciva commented 9 months ago

Thanks, I'll try to reproduce this!

I forgot to mention that before running make (after editing the Makefile), it would be helpful to run make clean. Lines #2 and #3 of the gdb bt shows exact lines of code, but #0 and #1 don't, so I suspect that those files were already compiled with the old non-g flags.

I don't think that would change anything in terms of debugging this, but for future reference in generating backtraces, it's good to include that make clean.

trmz commented 9 months ago

Tried make clean between Makefile edit and make :

Starting program: /usr/local/bin/spiped-22cc9b4 -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 4399)]
spiped-22cc9b4: shutdown: Transport endpoint is not connected
spiped-22cc9b4: Error running event loop
spiped-22cc9b4: close: Bad file descriptor
spiped-22cc9b4: close: Bad file descriptor

Thread 1 "spiped-22cc9b4" received signal SIGSEGV, Segmentation fault.
0x000055555555cda7 in sock_addr_free ()
(gdb) bt
#0  0x000055555555cda7 in sock_addr_free ()
#1  0x000055555555d3a6 in sock_addr_freelist ()
#2  0x000055555555e9fb in proto_conn_drop ()
#3  0x0000555555558ff2 in dispatch_shutdown (dispatch_cookie=dispatch_cookie@entry=0x55555557ff90) at dispatch.c:300
#4  0x000055555555846f in main (argc=<optimized out>, argv=<optimized out>) at main.c:391
(gdb) 

Out of curiosity, tried compiling previous versions - all of these seem fine (though, only tested them each for a short time each; will run them longer each) :

a27c087
17bdcaa
7b1416a
fc91a10

So far (on this test server) it has been the "client-side" / -e that's been segfaulting. Wanted to try inducing the -d side to segfault instead (as seen on other servers), so ran -d with spiped-22cc9b4 and -e with spiped-a27c087. Maybe an invalid test to run mismatched versions, but for whatever it's worth, the server-side did finally segfault too :

Starting program: /usr/local/bin/spiped-22cc9b4 -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 20691)]
spiped-22cc9b4: shutdown: Transport endpoint is not connected
spiped-22cc9b4: Error running event loop
spiped-22cc9b4: close: Bad file descriptor
spiped-22cc9b4: close: Bad file descriptor

Thread 1 "spiped-22cc9b4" received signal SIGSEGV, Segmentation fault.
0x000055555555cda7 in sock_addr_free ()
(gdb) bt
#0  0x000055555555cda7 in sock_addr_free ()
#1  0x000055555555d3a6 in sock_addr_freelist ()
#2  0x000055555555e9fb in proto_conn_drop ()
#3  0x0000555555558ff2 in dispatch_shutdown (dispatch_cookie=dispatch_cookie@entry=0x55555557ff90) at dispatch.c:300
#4  0x000055555555846f in main (argc=<optimized out>, argv=<optimized out>) at main.c:391

Edit : Dangit, messed up the previous commit bisect. Will redo it proper.

gperciva commented 9 months ago

Thanks, I was just about to ask for a build with the full debug output!

... but unfortunately I now see that there's a bug in the build process; $CFLAGS_DEFAULT isn't getting applied to the libraries. :(

I've uploaded a fix in the https://github.com/Tarsnap/spiped/tree/investigate-segfault branch. Could you please do:

git pull -r
git checkout investigate-segfault
make clean
make

and then try the test again?

gperciva commented 9 months ago

I wouldn't worry about trying to get -d to segfault right now. If you can reliably get -e to segfault (and even better, identify the commit that caused it!), I'm optimistic that fixing that will also fix the -d problem as well.

To make testing easier, you could run spiped -d from the repository (i.e. the stable 1.6.2). As far as I know, git master (and the investigate-segfault branch) are compatible with 1.6.2. That way, you can leave the spiped -d chugging along, and only change the spiped -e binary.

trmz commented 9 months ago

Thanks, will try that branch with fuller debug.

Bisected it a bit more properly in the meantime. 88dcd94 seems stable at a glance. 92aab18 dies immediately on first request or so. Unfortunately, backtrace seems considerably different, so might have to bisect not just for any error but the particular error above. Oh well !

Starting program: /usr/local/bin/spiped-92aab18 -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 9371)]
spiped-92aab18: shutdown: Transport endpoint is not connected
spiped-92aab18: Error running event loop
double free or corruption (out)

Thread 1 "spiped-92aab18" received signal SIGABRT, Aborted.
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
50      ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
(gdb) bt
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007ffff7afc537 in __GI_abort () at abort.c:79
#2  0x00007ffff7b543a8 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7ffff7c72390 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007ffff7b5b69a in malloc_printerr (str=str@entry=0x7ffff7c74750 "double free or corruption (out)") at malloc.c:5347
#4  0x00007ffff7b5ccc8 in _int_free (av=0x7ffff7ca8b80 <main_arena>, p=0x55555557ffa0, have_lock=<optimized out>) at malloc.c:4314
#5  0x000055555555cbe5 in sock_addr_freelist ()
#6  0x000055555555de96 in proto_conn_drop ()
#7  0x0000555555558dfa in dispatch_shutdown (dispatch_cookie=dispatch_cookie@entry=0x55555557ff40) at dispatch.c:296
#8  0x0000555555558713 in main (argc=<optimized out>, argv=<optimized out>) at main.c:377
trmz commented 9 months ago
Starting program: /usr/local/bin/spiped-48b0d03 -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 13520)]
spiped-48b0d03: shutdown: Transport endpoint is not connected
spiped-48b0d03: Error running event loop
spiped-48b0d03: close: Bad file descriptor
spiped-48b0d03: close: Bad file descriptor

Thread 1 "spiped-48b0d03" received signal SIGSEGV, Segmentation fault.
sock_addr_freelist (sas=<optimized out>) at ../libcperciva/util/sock.c:581
581                     sock_addr_free(*p);
(gdb) bt
#0  sock_addr_freelist (sas=<optimized out>) at ../libcperciva/util/sock.c:581
#1  sock_addr_freelist (sas=0x555555581008) at ../libcperciva/util/sock.c:571
#2  0x000055555555e6bf in proto_conn_drop (conn_cookie=0x555555581530, reason=reason@entry=1) at ../lib/proto/proto_conn.c:163
#3  0x0000555555558fe2 in dispatch_shutdown (dispatch_cookie=dispatch_cookie@entry=0x555555580f90) at dispatch.c:300
#4  0x000055555555845f in main (argc=<optimized out>, argv=<optimized out>) at main.c:391
(gdb) 
gperciva commented 9 months ago

Wow, thanks! I was not expecting to see a problem there!

I just uploaded a possible fix d01f5ed227; please try a git pull -r and test.

trmz commented 9 months ago

No luck, sorry :

Reading symbols from spiped-d01f5ed...
Starting program: /usr/local/bin/spiped-d01f5ed -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad1700 (LWP 17186)]
spiped-d01f5ed: shutdown: Transport endpoint is not connected
spiped-d01f5ed: Error running event loop
spiped-d01f5ed: close: Bad file descriptor
spiped-d01f5ed: close: Bad file descriptor

Thread 1 "spiped-d01f5ed" received signal SIGSEGV, Segmentation fault.
0x0000555555565c89 in network_connect_cancel (cookie=0x5) at ../libcperciva/network/network_connect.c:286
286             assert((C->cookie_immediate != NULL) || (C->s != -1));
(gdb) bt
#0  0x0000555555565c89 in network_connect_cancel (cookie=0x5) at ../libcperciva/network/network_connect.c:286
#1  0x000055555555e6b6 in proto_conn_drop (conn_cookie=0x5555555815f0, reason=reason@entry=1) at ../lib/proto/proto_conn.c:160
#2  0x0000555555558fe2 in dispatch_shutdown (dispatch_cookie=dispatch_cookie@entry=0x555555580f90) at dispatch.c:300
#3  0x000055555555845f in main (argc=<optimized out>, argv=<optimized out>) at main.c:391
trmz commented 9 months ago

Were you able to reproduce it locally ? I just setup a VPS with the issue, if you want to send me your SSH pubkey.

gperciva commented 9 months ago

Thanks, I'm optimistic that the latest backtrace got me on the right track -- I think I made an embarrassing mistake in df23813c6fab412774fb0055ecba24a634f51e94. If you would like to test it, please do a git pull to get eee0be56d472a3f1b21dc3f835b8d8f59e89a715 on the investigate-segfault branch.

I haven't been able to reproduce it yet, but if that doesn't fix it, I'd love to ssh into your VPS and work on it. I couldn't find your email on your github profile... could you please send me an email (just "hi" is fine) so that I can reply with my pubkey? gperciva@tarsnap.com

(I'm off work today for a volunteering task, but I'll be able to devote tomorrow to this.)

trmz commented 9 months ago

No bueno on eee0be5 - psql connects okay once, then -e seems to exit immediately, with no backtrace available (unless one-off was intended for this troubleshooting, or I missed a step).

Reading symbols from spiped-eee0be5...
Starting program: /usr/local/bin/spiped-eee0be5 -F -e -k /root/test.key -p /root/test.e.pid -s \[127.0.0.1\]:7732 -t \[10.0.0.111\]:7700
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7ad2700 (LWP 33138)]
spiped-eee0be5: shutdown: Transport endpoint is not connected
spiped-eee0be5: Error running event loop
[Thread 0x7ffff7ad2700 (LWP 33138) exited]
[Inferior 1 (process 33134) exited with code 01]
(gdb) bt
No stack.

Will email !

gperciva commented 9 months ago

It turns out that eee0be5 does fix a problem; but not this problem. (It's part of PR #398 now.)

I'm still investigating.

gperciva commented 8 months ago

I updated #398 to fix this issue (we should ignore an ENOTCONN, not bail); it's merged now.