Zygo / bees

Best-Effort Extent-Same, a btrfs dedupe agent
GNU General Public License v3.0
647 stars 55 forks source link

coredump #171

Closed fabricemarie closed 3 years ago

fabricemarie commented 3 years ago

Hi, When trying to run bees I get a coredump. I recompiled it with debug symbols and I get this:

I tried with --help first:

(gdb) run --help
Starting program: /var/tmp/bees/bin/bees --help
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
bees version v0.6-130-g177f393-dirty
2021-02-24 23:57:52 5321.5321<7> bees: Masking signals
2021-02-24 23:57:52 5321.5321<7> bees: BeesThread exec progress_report
[New Thread 0x7ffff7a70640 (LWP 5325)]
2021-02-24 23:57:52 5321.5321<7> bees: BeesThread exec status_report
2021-02-24 23:57:52 5321.5325<7> progress_report: Starting thread progress_report
[New Thread 0x7ffff726f640 (LWP 5326)]
Usage: /var/tmp/bees/bin/bees [options] fs-root-path
Performs best-effort extent-same deduplication on btrfs.

fs-root-path MUST be the root of a btrfs filesystem tree (subvol id 5).
Other directories will be rejected.

Options:
    -h, --help            Show this help

Load management options:
    -c, --thread-count    Worker thread count (default CPU count * factor)
    -C, --thread-factor   Worker thread factor (default 1)
    -G, --thread-min      Minimum worker thread count (default 0)
    -g, --loadavg-target  Target load average for worker threads (default none)

Filesystem tree traversal options:
    -m, --scan-mode       Scanning mode (0..2, default 0)

Workarounds:
    -a, --workaround-btrfs-send    Workaround for btrfs send
                                   (ignore RO snapshots)

Logging options:
    -t, --timestamps      Show timestamps in log output (default)
    -T, --no-timestamps   Omit timestamps in log output
    -p, --absolute-paths  Show absolute paths (default)
    -P, --strip-paths     Strip $CWD from beginning of all paths in the log
    -v, --verbose         Set maximum log level (0..8, default 8)

Optional environment variables:
    BEESHOME    Path to hash table and configuration files
                (default is .beeshome/ in the root of the filesystem).

    BEESSTATUS  File to write status to (tmpfs recommended, e.g. /run).
                No status is written if this variable is unset.
2021-02-24 23:57:52 5321.5321<7> bees: BeesThread destructor status_report
2021-02-24 23:57:52 5321.5321<7> bees: Cancelling thread status_report
2021-02-24 23:57:52 5321.5321<7> bees: Waiting for thread status_report
terminate called without an active exception

Thread 3 "status_report" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff726f640 (LWP 5326)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
49        return ret;
(gdb) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007ffff7a9c8a4 in __GI_abort () at abort.c:79
#2  0x00007ffff7e40926 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007ffff7e4c1ac in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#4  0x00007ffff7e4c217 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
#5  0x00007ffff7e4bbcc in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=10, exception_class=0, ue_header=0x7ffff726fcb0, context=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_personality.cc:673
#6  0x00007ffff7c51d94 in _Unwind_ForcedUnwind_Phase2 (exc=0x7ffff726fcb0, context=0x7ffff726e150, frames_p=0x7ffff726e058) at ../../../libgcc/unwind.inc:182
#7  0x00007ffff7c52705 in _Unwind_Resume (exc=exc@entry=0x7ffff726fcb0) at ../../../libgcc/unwind.inc:243
#8  0x00007ffff7ed4913 in std::__ostream_insert<char, std::char_traits<char> > (__out=..., __s=0x7fffe80010b0 "2021-02-24 23:57:52 5321.5326<7> status_report: Starting thread status_report\n", __n=<optimized out>)
    at /usr/src/debug/gcc-10.2.1-9.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream_insert.h:82
#9  0x00000000004d97ba in crucible::Chatter::~Chatter (this=0x7ffff726ea70, __in_chrg=<optimized out>) at chatter.cc:96
#10 0x00000000004c2145 in operator() (__closure=0x5d4628) at bees-thread.cc:19
#11 0x00000000004c422e in std::__invoke_impl<void, BeesThread::exec(std::function<void()>)::<lambda()> >(std::__invoke_other, struct {...} &&) (__f=...) at /usr/include/c++/10/bits/invoke.h:60
#12 0x00000000004c41e3 in std::__invoke<BeesThread::exec(std::function<void()>)::<lambda()> >(struct {...} &&) (__fn=...) at /usr/include/c++/10/bits/invoke.h:95
#13 0x00000000004c4190 in std::thread::_Invoker<std::tuple<BeesThread::exec(std::function<void()>)::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x5d4628) at /usr/include/c++/10/thread:264
#14 0x00000000004c4164 in std::thread::_Invoker<std::tuple<BeesThread::exec(std::function<void()>)::<lambda()> > >::operator()(void) (this=0x5d4628) at /usr/include/c++/10/thread:271
#15 0x00000000004c4148 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<BeesThread::exec(std::function<void()>)::<lambda()> > > >::_M_run(void) (this=0x5d4620) at /usr/include/c++/10/thread:215
#16 0x00007ffff7e785f4 in std::execute_native_thread_routine (__p=0x5d4620) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#17 0x00007ffff7f933f9 in start_thread (arg=0x7ffff726f640) at pthread_create.c:463
#18 0x00007ffff7b77b53 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Then directly on / (which is the root path volID 5) and get a different signal/coredump:

(gdb) run /
Starting program: /var/tmp/bees/bin/bees /
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
bees version v0.6-130-g177f393-dirty
2021-02-24 23:59:10 5333.5333<7> bees: Masking signals
2021-02-24 23:59:10 5333.5333<7> bees: BeesThread exec progress_report
[New Thread 0x7ffff7a70640 (LWP 5337)]
2021-02-24 23:59:10 5333.5333<7> bees: BeesThread exec status_report
2021-02-24 23:59:10 5333.5337<7> progress_report: Starting thread progress_report
[New Thread 0x7ffff726f640 (LWP 5338)]
2021-02-24 23:59:10 5333.5333<6> bees: setting rlimit NOFILE to 10340
2021-02-24 23:59:10 5333.5333<5> bees: setting worker thread pool maximum size to 2
[New Thread 0x7ffff6a6e640 (LWP 5339)]
[New Thread 0x7ffff626d640 (LWP 5340)]
2021-02-24 23:59:10 5333.5333<5> bees: setting root path to '/'
2021-02-24 23:59:10 5333.5333<6> bees: set_root_path /
2021-02-24 23:59:10 5333.5333<6> bees: set_root_fd /
2021-02-24 23:59:10 5333.5333<7> bees: BeesThread destructor status_report
2021-02-24 23:59:10 5333.5333<7> bees: Cancelling thread status_report
2021-02-24 23:59:10 5333.5333<7> bees: Waiting for thread status_report
terminate called without an active exception

Thread 3 "status_report" received signal SIGABRT, Aborted.
[Switching to Thread 0x7ffff726f640 (LWP 5338)]
__GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
49        return ret;
(gdb) backtrace
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:49
#1  0x00007ffff7a9c8a4 in __GI_abort () at abort.c:79
#2  0x00007ffff7e40926 in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007ffff7e4c1ac in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#4  0x00007ffff7e4c217 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
#5  0x00007ffff7e4bbcc in __cxxabiv1::__gxx_personality_v0 (version=<optimized out>, actions=10, exception_class=0, ue_header=0x7ffff726fcb0, context=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_personality.cc:673
#6  0x00007ffff7c51d94 in _Unwind_ForcedUnwind_Phase2 (exc=0x7ffff726fcb0, context=0x7ffff726e150, frames_p=0x7ffff726e058) at ../../../libgcc/unwind.inc:182
#7  0x00007ffff7c52705 in _Unwind_Resume (exc=exc@entry=0x7ffff726fcb0) at ../../../libgcc/unwind.inc:243
#8  0x00007ffff7ed4913 in std::__ostream_insert<char, std::char_traits<char> > (__out=..., __s=0x7fffe80010b0 "2021-02-24 23:59:10 5333.5338<7> status_report: Starting thread status_report\n", __n=<optimized out>)
    at /usr/src/debug/gcc-10.2.1-9.fc33.x86_64/obj-x86_64-redhat-linux/x86_64-redhat-linux/libstdc++-v3/include/bits/ostream_insert.h:82
#9  0x00000000004d97ba in crucible::Chatter::~Chatter (this=0x7ffff726ea70, __in_chrg=<optimized out>) at chatter.cc:96
#10 0x00000000004c2145 in operator() (__closure=0x5d4628) at bees-thread.cc:19
#11 0x00000000004c422e in std::__invoke_impl<void, BeesThread::exec(std::function<void()>)::<lambda()> >(std::__invoke_other, struct {...} &&) (__f=...) at /usr/include/c++/10/bits/invoke.h:60
#12 0x00000000004c41e3 in std::__invoke<BeesThread::exec(std::function<void()>)::<lambda()> >(struct {...} &&) (__fn=...) at /usr/include/c++/10/bits/invoke.h:95
#13 0x00000000004c4190 in std::thread::_Invoker<std::tuple<BeesThread::exec(std::function<void()>)::<lambda()> > >::_M_invoke<0>(std::_Index_tuple<0>) (this=0x5d4628) at /usr/include/c++/10/thread:264
#14 0x00000000004c4164 in std::thread::_Invoker<std::tuple<BeesThread::exec(std::function<void()>)::<lambda()> > >::operator()(void) (this=0x5d4628) at /usr/include/c++/10/thread:271
#15 0x00000000004c4148 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<BeesThread::exec(std::function<void()>)::<lambda()> > > >::_M_run(void) (this=0x5d4620) at /usr/include/c++/10/thread:215
#16 0x00007ffff7e785f4 in std::execute_native_thread_routine (__p=0x5d4620) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#17 0x00007ffff7f933f9 in start_thread (arg=0x7ffff726f640) at pthread_create.c:463
#18 0x00007ffff7b77b53 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

The packages I use are are x86_64: kernel-5.10.16-200 gcc-c++-10.2.1-9 libuuid-devel-2.36.1-1 btrfs-progs-5.10-1

Zygo commented 3 years ago

Something in the depths of the C++ runtime is throwing an exception in the log ostream.

We've seen this from time to time. Usually it happens after upgrading some part of libc or libc++, and all binaries have to be rebuilt to make it stop. Haven't been able to figure out what this is, but it may be related to incomplete or buggy pthread_cancel support. It might be some obscure ABI incompatibility, based on _gcc_personality_v0 in the stack trace, and the fact the problem goes away after updating all the -devel packages on the build system and rebuilding.

On the other hand, we only use pthread_cancel in a few cases, and none of them are needed. It was certainly convenient to write some of the code assuming pthread_cancel works, but we literally only call it while the process is exiting, and there are one or two cases like these where it does not work.

Also note that running on '/' doesn't work if '/' is not the root of a btrfs volume (it can't be a named subvol, it must be subvol=/). The code might be trying to print an exception like this one:

# ~./bin/bees /
bees version v0.6-152-gbfc4a9b5
2021-04-28 16:46:12.826 23138.23138<7> bees: Masking signals
2021-04-28 16:46:12.826 23138.23138<7> bees: BeesThread exec progress_report
2021-04-28 16:46:12.826 23138.23138<7> bees: BeesThread exec status_report
2021-04-28 16:46:12.826 23138.23139<7> progress_report: Starting thread progress_report
2021-04-28 16:46:12.826 23138.23138<6> bees: setting rlimit NOFILE to 10340
2021-04-28 16:46:12.826 23138.23140<7> status_report: Starting thread status_report
2021-04-28 16:46:12.826 23138.23138<5> bees: setting worker thread pool maximum size to 16
2021-04-28 16:46:12.826 23138.23140<7> status_report: Exiting thread status_report, 0.001 sec
2021-04-28 16:46:12.826 23138.23138<5> bees: setting root path to '/'
2021-04-28 16:46:12.826 23138.23138<6> bees: set_root_path /
2021-04-28 16:46:12.826 23138.23138<6> bees: set_root_fd /
2021-04-28 16:46:12.826 23138.23138<7> bees: BeesThread destructor status_report
2021-04-28 16:46:12.826 23138.23138<7> bees: Cancelling thread status_report
2021-04-28 16:46:12.826 23138.23138<7> bees: pthread_cancel returned Unknown error -3
2021-04-28 16:46:12.826 23138.23138<7> bees: Waiting for thread status_report
2021-04-28 16:46:12.826 23138.23138<7> bees: Waited for status_report, 0.001 sec
2021-04-28 16:46:12.826 23138.23138<7> bees: BeesThread destructor progress_report
2021-04-28 16:46:12.826 23138.23138<7> bees: Cancelling thread progress_report
2021-04-28 16:46:12.826 23138.23138<7> bees: Waiting for thread progress_report
2021-04-28 16:46:12.826 23138.23138<7> bees: Waited for progress_report, 0.001 sec
2021-04-28 16:46:12.826 23138.23138<5> bees:
2021-04-28 16:46:12.826 23138.23138<5> bees:
2021-04-28 16:46:12.826 23138.23138<5> bees: *** EXCEPTION ***
2021-04-28 16:46:12.826 23138.23138<5> bees:    exception type std::invalid_argument: root_fd_treeid = 258 failed constraint check (root_fd_treeid == BTRFS_FS_TREE_OBJECTID) at bees-context.cc:940
2021-04-28 16:46:12.826 23138.23138<5> bees: ***
terminate called after throwing an instance of 'std::invalid_argument'
  what():  root_fd_treeid = 258 failed constraint check (root_fd_treeid == BTRFS_FS_TREE_OBJECTID) at bees-context.cc:940
Aborted

If '/' is not the root of a btrfs then you will need to mount the btrfs root somewhere else so bees can access it. This is the purpose of the /var/lib/bees mounts in https://zygo.github.io/bees/running.html

fabricemarie commented 3 years ago

Changed the root volume to be subvol=/ (and indeed upgraded a few packages since the original submission date). It works beautifully now. Thanks!