Feh / nocache

minimize caching effects
BSD 2-Clause "Simplified" License
554 stars 53 forks source link

SIGSEGV on exit with GnuTLS #27

Closed grawity closed 8 years ago

grawity commented 8 years ago

nocache git annex works well, but always dies with a segfault when exiting:

$ nocache git annex version
git-annex version: 6.20160126
build flags: Assistant Webapp Pairing Testsuite S3(multipartupload)(storageclasses) WebDAV Inotify DBus DesktopNotify XMPP ConcurrentOutput TorrentParser Feeds Quvi
key/value backends: SHA256E SHA256 SHA512E SHA512 SHA224E SHA224 SHA384E SHA384 SHA3_256E SHA3_256 SHA3_512E SHA3_512 SHA3_224E SHA3_224 SHA3_384E SHA3_384 SKEIN256E SKEIN256 SKEIN512E SKEIN512 SHA1E SHA1 MD5E MD5 WORM URL
remote types: git gcrypt S3 bup directory rsync web bittorrent webdav tahoe glacier ddar hook external
local repository version: 5
supported repository versions: 5 6
upgrade supported from repository versions: 0 1 2 4 5
error: git-annex died of signal 11

Affects both the official binary builds and the Arch Linux community/git-annex builds.

Feh commented 8 years ago

I’ve just tried this out with Debian’s git-annex version 5.20150731-1 and it doesn’t segfault for me. Can you get a backtrace for this, i.e. compile git-annex with debugging symbols, enable core dumps and do a bt full in GDB?

grawity commented 8 years ago

Hmm, trying to find out how to do that for a Haskell program

Feh commented 8 years ago

@xou reproduced the crash on Arch, the backtrace is:

#0  0x00007f401867fd72 in free_unclaimed_pages () from ./nocache.so
#1  0x00007f401867f63b in close () from ./nocache.so
#2  0x00007f4017dc9de3 in ?? () from /usr/lib/libgnutls.so.30
#3  0x00007f4017dc9e06 in ?? () from /usr/lib/libgnutls.so.30
#4  0x00007f4017d1ef9b in ?? () from /usr/lib/libgnutls.so.30
#5  0x00007f4018892867 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#6  0x00007f401685cf88 in __run_exit_handlers () from /usr/lib/libc.so.6
#7  0x00007f401685cfd5 in exit () from /usr/lib/libc.so.6
#8  0x0000000002c94f78 in ?? ()
#9  0x0000000002c94f8e in ?? ()
#10 0x0000000002c34f2e in ?? ()
#11 0x0000000000000000 in ?? ()

So it appears that that free_unclaimed_pages is not defensive enough. Could either of you recompile nocache with debugging symbols so I know where exactly it crashes?

grawity commented 8 years ago
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bd7cc4 in free_unclaimed_pages (fd=3) at nocache.c:418
418         if(fds[i].fd == fd)
(gdb) bt
#0  0x00007ffff7bd7cc4 in free_unclaimed_pages (fd=3) at nocache.c:418
#1  0x00007ffff7bd75dc in close (fd=3) at nocache.c:305
#2  0x00007ffff7321fa3 in ?? () from /usr/lib/libgnutls.so.30
#3  0x00007ffff7321fc6 in ?? () from /usr/lib/libgnutls.so.30
#4  0x00007ffff7276f9b in ?? () from /usr/lib/libgnutls.so.30
#5  0x00007ffff7dea867 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#6  0x00007ffff5db4f88 in __run_exit_handlers () from /usr/lib/libc.so.6
#7  0x00007ffff5db4fd5 in exit () from /usr/lib/libc.so.6
#8  0x0000000002c94f78 in ?? ()
#9  0x0000000002c94f8e in ?? ()
#10 0x0000000002c9feb3 in ?? ()
#11 0x0000000000d7c5e7 in ?? ()
#12 0x00007ffff5d9f610 in __libc_start_main () from /usr/lib/libc.so.6
#13 0x00000000004135b9 in ?? ()
(gdb) bt full
#0  0x00007ffff7bd7cc4 in free_unclaimed_pages (fd=3) at nocache.c:418
        i = 0
        st = {st_dev = 5, st_ino = 0, st_nlink = 0, st_mode = 4156381752, st_uid = 32767, 
          st_gid = 1, __pad0 = 0, st_rdev = 140737488345024, st_size = 140737342954952, 
          st_blksize = 140737351974832, st_blocks = 140737353786576, st_atim = {
            tv_sec = 140737488344656, tv_nsec = 140737345315536}, st_mtim = {
            tv_sec = 140737351974832, tv_nsec = 281479271743489}, st_ctim = {tv_sec = 65537, 
            tv_nsec = 140737488344528}, __glibc_reserved = {1, 140737340645312, 0}}
        mask = {__val = {18446744067267100671, 18446744073709551615 <repeats 15 times>}}
        old_mask = {__val = {0, 140737353774288, 140737353774288, 140737317973344, 
            140737318027008, 140737351928106, 127, 140737318027008, 140737353774288, 
            140737342998432, 140737342954952, 140737488344712, 140737488344912, 1, 0, 
            140737351947853}}
        br = 0x7ffff7faa9a8
#1  0x00007ffff7bd75dc in close (fd=3) at nocache.c:305
        __PRETTY_FUNCTION__ = "close"
#2  0x00007ffff7321fa3 in ?? () from /usr/lib/libgnutls.so.30
No symbol table info available.
#3  0x00007ffff7321fc6 in ?? () from /usr/lib/libgnutls.so.30
No symbol table info available.
#4  0x00007ffff7276f9b in ?? () from /usr/lib/libgnutls.so.30
No symbol table info available.
#5  0x00007ffff7dea867 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
No symbol table info available.
#6  0x00007ffff5db4f88 in __run_exit_handlers () from /usr/lib/libc.so.6
No symbol table info available.
#7  0x00007ffff5db4fd5 in exit () from /usr/lib/libc.so.6
No symbol table info available.
#8  0x0000000002c94f78 in ?? ()
No symbol table info available.
#9  0x0000000002c94f8e in ?? ()
No symbol table info available.
#10 0x0000000002c9feb3 in ?? ()
No symbol table info available.
#11 0x0000000000d7c5e7 in ?? ()
No symbol table info available.
#12 0x00007ffff5d9f610 in __libc_start_main () from /usr/lib/libc.so.6
No symbol table info available.
#13 0x00000000004135b9 in ?? ()
No symbol table info available.
(gdb) p i
$1 = 0
(gdb) p fds
$2 = (struct file_pageinfo *) 0x7ffff7fc2010
(gdb) p fds[0]
Cannot access memory at address 0x7ffff7fc2010
grawity commented 8 years ago

Actually, based on the backtrace, this seems to be common to all libgnutls-using programs – I got the same crash with nocache gnutls-cli _(which is the GnuTLS equivalent of openssl s_client)_ and nocache lftp.

Here's a more complete trace (still of git-annex):

Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7bd7cc4 in free_unclaimed_pages (fd=3) at nocache.c:418
418         if(fds[i].fd == fd)
(gdb) bt
#0  0x00007ffff7bd7cc4 in free_unclaimed_pages (fd=3) at nocache.c:418
#1  0x00007ffff7bd75dc in close (fd=3) at nocache.c:305
#2  0x00007ffff731f9c1 in _rnd_system_entropy_deinit () at rnd-common.c:291
#3  0x00007ffff731fc65 in wrap_nettle_rnd_deinit (ctx=0x0) at rnd.c:149
#4  0x00007ffff7235c81 in _gnutls_rnd_deinit () at random.c:61
#5  0x00007ffff72236dc in _gnutls_global_deinit (destructor=1) at gnutls_global.c:391
#6  0x00007ffff7223896 in lib_deinit () at gnutls_global.c:496
#7  0x00007ffff7dea867 in _dl_fini () from /lib64/ld-linux-x86-64.so.2
#8  0x00007ffff5d4ef88 in __run_exit_handlers () from /usr/lib/libc.so.6
#9  0x00007ffff5d4efd5 in exit () from /usr/lib/libc.so.6
#10 0x0000000002c94f78 in ?? ()
#11 0x0000000002c94f8e in ?? ()
#12 0x0000000002c9feb3 in ?? ()
#13 0x0000000000d7c5e7 in ?? ()
#14 0x00007ffff5d39610 in __libc_start_main () from /usr/lib/libc.so.6
#15 0x00000000004135b9 in ?? ()

fd#3 was opened from /dev/urandom, probably by libgnutls itself.

Feh commented 8 years ago

Thanks! I still can’t reproduce this with nocache gnutls-cli, but your backtrace is a good start. A possibility worth investigating is if GnuTLS has its own destructor that runs after the nocache destructor; there is a (nonstandard?) way to specify constructor/destructor priorities, which might be worth a try; additionally, nocache’s destructor should do cleanup work of restoring the original syscall wrappers, which I believe it currently doesn’t. I’ll take a look, hopefully tomorrow. Feel free to poke around a bit and send a PR if you have time.

grawity commented 8 years ago

Maybe that's because Debian has an older libgnutls, too.

It almost feels like free_unclaimed_pages is being called twice and crashes because fds[] was free'd the first time, could that be the issue?

Feh commented 8 years ago

Because of issue #30 I’ve been looking into this again. Could you perhaps try the commit referenced there and see if it solves your problem, too?

grawity commented 8 years ago

Oh, I replied to the wrong issue. Anyway, 1f4c9ea4f2874682ec40163d6edb69e482204391 seems to fix crashes in both git-annex and gnutls-cli.