Open marcin-rzeznicki opened 2 years ago
I was wondering if my system was the only one with this issue. I had this happen after upgrading from 0.6.5 to 0.7, same output.
Systemctl would restart the service a couple of times, then give up. Hours later, I'd manually start it again and it would die silently the same way later in the night or so.
I've got debug symbols now, I'll post a complete stacktrace soon - maybe that will help to solve the issue
Since you are on Arch, do you also observe this when running bees-git from the AUR? Just saying this as there have been a lot of commits since the last tag. https://github.com/Zygo/bees/compare/v0.7...master
If this does not happen anymore, maybe it's time for a new release.
I've switched to bees-git a couple of days ago and it doesn't crash anymore.
I can't reproduce it either anymore. It must have had something to do with FS state, as I didn't upgrade anything. Heisenbug :thinking:
Ah, a few minutes after I wrote the above comment, it core-dumped again. This time I captured the exact stacktrace
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007f47d6e4d543 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007f47d6dfd998 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007f47d6de753d in __GI_abort () at abort.c:79
#4 0x00007f47d7186112 in std::__glibcxx_assert_fail (file=file@entry=0x555a9683d4b8 "/usr/include/c++/12.1.1/bits/stl_vector.h", line=line@entry=1142,
function=function@entry=0x555a96843f68 "std::vector<_Tp, _Alloc>::const_reference std::vector<_Tp, _Alloc>::operator[](size_type) const [with _Tp = unsigned char; _Alloc = std::allocator<unsigned char>; const_reference = const unsigned char"..., condition=condition@entry=0x555a968436f8 "__n < this->size()") at /usr/src/debug/gcc/libstdc++-v3/src/c++11/debug.cc:60
#5 0x0000555a96821800 in std::vector<unsigned char, std::allocator<unsigned char> >::operator[] (this=<optimized out>, __n=<optimized out>) at /usr/include/c++/12.1.1/bits/stl_vector.h:1142
#6 std::vector<unsigned char, std::allocator<unsigned char> >::operator[] (__n=<optimized out>, this=<optimized out>) at /usr/include/c++/12.1.1/bits/stl_vector.h:1140
#7 crucible::BtrfsIoctlSearchHeader::set_data (this=<optimized out>, v=std::vector of length 4208, capacity 4208 = {...}, offset=<optimized out>) at ../lib/fs.cc:751
#8 0x0000555a968219db in crucible::BtrfsIoctlSearchKey::do_ioctl_nothrow (this=0x7f47c27fb2a0, fd=<optimized out>) at ../lib/fs.cc:787
#9 0x0000555a968265db in crucible::BtrfsIoctlSearchKey::do_ioctl (this=<optimized out>, fd=3) at ../lib/fs.cc:797
#10 0x0000555a96807802 in BeesRoots::next_root (root=<optimized out>, this=0x555a97170a00) at ../lib/fd.cc:119
#11 BeesRoots::insert_new_crawl (this=0x555a97170a00) at /usr/src/debug/bees-0.7/src/bees-roots.cc:496
#12 operator() (__closure=<optimized out>) at /usr/src/debug/bees-0.7/src/bees-roots.cc:403
#13 std::__invoke_impl<void, BeesRoots::crawl_thread()::<lambda()>&> (__f=...) at /usr/include/c++/12.1.1/bits/invoke.h:61
#14 std::__invoke_r<void, BeesRoots::crawl_thread()::<lambda()>&> (__fn=...) at /usr/include/c++/12.1.1/bits/invoke.h:154
#15 std::_Function_handler<void(), BeesRoots::crawl_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/12.1.1/bits/std_function.h:290
#16 0x0000555a96830cb7 in std::function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/12.1.1/bits/std_function.h:591
#17 crucible::catch_all(std::function<void ()> const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)> const&) [clone .isra.0] (f=..., explainer=...) at ../lib/error.cc:55
#18 0x0000555a967fbf1e in BeesRoots::crawl_thread (this=0x555a97170a00) at /usr/src/debug/bees-0.7/src/bees-roots.cc:401
#19 0x0000555a96830cb7 in std::function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/12.1.1/bits/std_function.h:591
#20 crucible::catch_all(std::function<void ()> const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)> const&) [clone .isra.0] (f=..., explainer=...) at ../lib/error.cc:55
#21 0x0000555a9680fdc6 in operator() (__closure=0x555a971f1ca8) at /usr/src/debug/bees-0.7/src/bees-thread.cc:22
#22 0x00007f47d718a2f3 in std::execute_native_thread_routine (__p=0x555a971f1ca0) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
#23 0x00007f47d6e4b78d in start_thread (arg=<optimized out>) at pthread_create.c:442
#24 0x00007f47d6ecc8e4 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:100
So it seems that the exact culprit is the following line:
(gdb) frame 7
#7 crucible::BtrfsIoctlSearchHeader::set_data (this=<optimized out>, v=std::vector of length 4208, capacity 4208 = {...}, offset=<optimized out>) at ../lib/fs.cc:751
Downloading 0.03 MB source file /usr/src/debug/bees-0.7/src/../lib/fs.cc
751 memcpy(static_cast<btrfs_ioctl_search_header *>(this), &v[offset], sizeof(btrfs_ioctl_search_header));
It looks like you have a large item in the root tree metadata? Can you provide the output of:
btrfs ins dump-tree -t 1 /dev/your-btrfs-device-here
The code was rewritten between v0.7
and master
, and the rewrite seems to have taken this bug with it.
The assert comes from the C++ standard library's std::vector
implementation. We are just calculating an address at this point, and should not be hitting an assertion failure, but it is technically undefined behavior, and the library maintainers are within their rights to enable an assertion for it that breaks legacy code.
In master
I got rid of std::vector
because it was roughly 3 orders of magnitude slower than the class ByteVector
I wrote to replace it. That would also get rid of the assert
since I didn't put one into the ByteVector
class. ByteVector
does have length checks which throw exceptions, so the same problem would reappear as an exception; however, the new BtrfsIoctlSearchKey::do_ioctl_nothrow
method resizes its buffer automatically, so buffer overflow should no longer be a problem.
Sure - output is very large, so I'll just post a couple of first entries:
btrfs-progs v5.19
root tree
node 50628263936 level 1 items 8 free space 485 generation 16525 owner ROOT_TREE
node 50628263936 flags 0x1(WRITTEN) backref revision 1
fs uuid b32fa38f-7d03-41ed-a4fd-9909d7d0f093
chunk uuid 242eb397-499c-4f8d-bb7e-6e6ae870fa05
key (EXTENT_TREE ROOT_ITEM 0) block 50944212992 gen 16525
key (304 INODE_ITEM 0) block 50751356928 gen 16525
key (485 INODE_ITEM 0) block 160594116608 gen 16425
key (609 INODE_ITEM 0) block 160614367232 gen 16425
key (827 INODE_ITEM 0) block 50497372160 gen 16524
key (1426 ROOT_ITEM 10829) block 374149300224 gen 13897
key (1431 ROOT_ITEM 11076) block 50876104704 gen 16506
key (FREE_SPACE UNTYPED 9694085120) block 50890178560 gen 16525
leaf 50944212992 items 85 free space 4940 generation 16525 owner ROOT_TREE
leaf 50944212992 flags 0x1(WRITTEN) backref revision 1
fs uuid b32fa38f-7d03-41ed-a4fd-9909d7d0f093
chunk uuid 242eb397-499c-4f8d-bb7e-6e6ae870fa05
item 0 key (EXTENT_TREE ROOT_ITEM 0) itemoff 15844 itemsize 439
generation 16525 root_dirid 0 bytenr 50628280320 byte_limit 0 bytes_used 361152512
last_snapshot 0 flags 0x0(none) refs 1
drop_progress key (0 UNKNOWN.0 0) drop_level 0
level 2 generation_v2 16525
uuid 00000000-0000-0000-0000-000000000000
parent_uuid 00000000-0000-0000-0000-000000000000
received_uuid 00000000-0000-0000-0000-000000000000
ctransid 0 otransid 0 stransid 0 rtransid 0
ctime 0.0 (1970-01-01 01:00:00)
otime 0.0 (1970-01-01 01:00:00)
stime 0.0 (1970-01-01 01:00:00)
rtime 0.0 (1970-01-01 01:00:00)
item 1 key (DEV_TREE ROOT_ITEM 0) itemoff 15405 itemsize 439
generation 16445 root_dirid 0 bytenr 100466688 byte_limit 0 bytes_used 49152
last_snapshot 0 flags 0x0(none) refs 1
drop_progress key (0 UNKNOWN.0 0) drop_level 0
level 1 generation_v2 16445
uuid 00000000-0000-0000-0000-000000000000
parent_uuid 00000000-0000-0000-0000-000000000000
received_uuid 00000000-0000-0000-0000-000000000000
ctransid 0 otransid 0 stransid 0 rtransid 0
ctime 0.0 (1970-01-01 01:00:00)
otime 0.0 (1970-01-01 01:00:00)
stime 0.0 (1970-01-01 01:00:00)
rtime 0.0 (1970-01-01 01:00:00)
item 2 key (FS_TREE INODE_REF 6) itemoff 15388 itemsize 17
index 0 namelen 7 name: default
item 3 key (FS_TREE ROOT_ITEM 0) itemoff 14949 itemsize 439
generation 16419 root_dirid 256 bytenr 116984627200 byte_limit 0 bytes_used 16384
last_snapshot 0 flags 0x0(none) refs 1
drop_progress key (0 UNKNOWN.0 0) drop_level 0
level 0 generation_v2 16419
uuid e263b3e2-42a2-4676-8dbf-a81e5a6f5e24
parent_uuid 00000000-0000-0000-0000-000000000000
received_uuid 00000000-0000-0000-0000-000000000000
ctransid 16373 otransid 0 stransid 0 rtransid 0
ctime 1661454436.69527256 (2022-08-25 21:07:16)
otime 1570733302.0 (2019-10-10 20:48:22)
stime 0.0 (1970-01-01 01:00:00)
rtime 0.0 (1970-01-01 01:00:00)
item 4 key (FS_TREE ROOT_REF 257) itemoff 14925 itemsize 24
root ref key dirid 256 sequence 2 name BACKUP
item 5 key (FS_TREE ROOT_REF 1374) itemoff 14898 itemsize 27
root ref key dirid 256 sequence 3 name .beeshome
item 6 key (ROOT_TREE_DIR INODE_ITEM 0) itemoff 14738 itemsize 160
generation 3 transid 0 size 0 nbytes 16384
block group 0 mode 40755 links 1 uid 0 gid 0 rdev 0
sequence 0 flags 0x0(none)
atime 1570733302.0 (2019-10-10 20:48:22)
ctime 1570733302.0 (2019-10-10 20:48:22)
mtime 1570733302.0 (2019-10-10 20:48:22)
otime 1570733302.0 (2019-10-10 20:48:22)
item 7 key (ROOT_TREE_DIR INODE_REF 6) itemoff 14726 itemsize 12
index 0 namelen 2 name: ..
item 8 key (ROOT_TREE_DIR DIR_ITEM 2378154706) itemoff 14689 itemsize 37
location key (FS_TREE ROOT_ITEM 18446744073709551615) type DIR
transid 0 data_len 0 name_len 7
name: default
item 9 key (CSUM_TREE ROOT_ITEM 0) itemoff 14250 itemsize 439
generation 16525 root_dirid 0 bytenr 50767003648 byte_limit 0 bytes_used 284213248
last_snapshot 0 flags 0x0(none) refs 1
drop_progress key (0 UNKNOWN.0 0) drop_level 0
level 2 generation_v2 16525
uuid 00000000-0000-0000-0000-000000000000
parent_uuid 00000000-0000-0000-0000-000000000000
received_uuid 00000000-0000-0000-0000-000000000000
ctransid 0 otransid 0 stransid 0 rtransid 0
ctime 0.0 (1970-01-01 01:00:00)
otime 0.0 (1970-01-01 01:00:00)
stime 0.0 (1970-01-01 01:00:00)
rtime 0.0 (1970-01-01 01:00:00)
We're looking for something with a large 'itemsize', especially if it's > 4096.
Ah, then I am afraid there is nothing that large:
# btrfs inspect-internal dump-tree -t 1 /dev/disk/by-uuid/b32fa38f-7d03-41ed-a4fd-9909d7d0f093 | grep 'itemsize' | sort -b --key=10nr
item 0 key (1426 ROOT_ITEM 10829) itemoff 15844 itemsize 439
item 0 key (1431 ROOT_ITEM 11076) itemoff 15844 itemsize 439
...
I switched to git package and bees works again. maybe new release would be a good idea.
Unfortunately 0.7.2
does the same thing for me again:
(gdb) bt
#0 __pthread_kill_implementation (threadid=<optimized out>, signo=signo@entry=6, no_tid=no_tid@entry=0) at pthread_kill.c:44
#1 0x00007fef80f716b3 in __pthread_kill_internal (signo=6, threadid=<optimized out>) at pthread_kill.c:78
#2 0x00007fef80f21958 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#3 0x00007fef80f0b53d in __GI_abort () at abort.c:79
#4 0x00007fef812aa112 in std::__glibcxx_assert_fail (file=file@entry=0x559cf0e674b8 "/usr/include/c++/12.2.0/bits/stl_vector.h", line=line@entry=1142,
function=function@entry=0x559cf0e6df68 "std::vector<_Tp, _Alloc>::const_reference std::vector<_Tp, _Alloc>::operator[](size_type) const [with _Tp = unsigned char; _Alloc = std::allocator<unsigned char>; const_reference = const unsigned char"..., condition=condition@entry=0x559cf0e6d6f8 "__n < this->size()") at /usr/src/debug/gcc/libstdc++-v3/src/c++11/debug.cc:60
#5 0x0000559cf0e4b800 in std::vector<unsigned char, std::allocator<unsigned char> >::operator[] (this=<optimized out>, __n=<optimized out>)
at /usr/include/c++/12.2.0/bits/stl_vector.h:1142
#6 std::vector<unsigned char, std::allocator<unsigned char> >::operator[] (__n=<optimized out>, this=<optimized out>) at /usr/include/c++/12.2.0/bits/stl_vector.h:1140
#7 crucible::BtrfsIoctlSearchHeader::set_data (this=<optimized out>, v=std::vector of length 4208, capacity 4208 = {...}, offset=<optimized out>) at ../lib/fs.cc:751
#8 0x0000559cf0e4b9db in crucible::BtrfsIoctlSearchKey::do_ioctl_nothrow (this=0x7fef78ed32a0, fd=<optimized out>) at ../lib/fs.cc:787
#9 0x0000559cf0e505db in crucible::BtrfsIoctlSearchKey::do_ioctl (this=<optimized out>, fd=3) at ../lib/fs.cc:797
#10 0x0000559cf0e31802 in BeesRoots::next_root (root=<optimized out>, this=0x559cf1739a00) at ../lib/fd.cc:119
#11 BeesRoots::insert_new_crawl (this=0x559cf1739a00) at /usr/src/debug/bees-0.7.2/src/bees-roots.cc:496
#12 operator() (__closure=<optimized out>) at /usr/src/debug/bees-0.7.2/src/bees-roots.cc:403
#13 std::__invoke_impl<void, BeesRoots::crawl_thread()::<lambda()>&> (__f=...) at /usr/include/c++/12.2.0/bits/invoke.h:61
#14 std::__invoke_r<void, BeesRoots::crawl_thread()::<lambda()>&> (__fn=...) at /usr/include/c++/12.2.0/bits/invoke.h:154
#15 std::_Function_handler<void(), BeesRoots::crawl_thread()::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/12.2.0/bits/std_function.h:290
#16 0x0000559cf0e5acb7 in std::function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/12.2.0/bits/std_function.h:591
#17 crucible::catch_all(std::function<void ()> const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)> const&) [clone .isra.0] (
f=..., explainer=...) at ../lib/error.cc:55
#18 0x0000559cf0e25f1e in BeesRoots::crawl_thread (this=0x559cf1739a00) at /usr/src/debug/bees-0.7.2/src/bees-roots.cc:401
#19 0x0000559cf0e5acb7 in std::function<void ()>::operator()() const (this=<optimized out>) at /usr/include/c++/12.2.0/bits/std_function.h:591
#20 crucible::catch_all(std::function<void ()> const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)> const&) [clone .isra.0] (
f=..., explainer=...) at ../lib/error.cc:55
#21 0x0000559cf0e39dc6 in operator() (__closure=0x559cf17baca8) at /usr/src/debug/bees-0.7.2/src/bees-thread.cc:22
#22 0x00007fef812ae2f3 in std::execute_native_thread_routine (__p=0x559cf17baca0) at /usr/src/debug/gcc/libstdc++-v3/src/c++11/thread.cc:82
#23 0x00007fef80f6f8fd in start_thread (arg=<optimized out>) at pthread_create.c:442
#24 0x00007fef80ff1a60 in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
Nobody seems to have explicitly mentioned it here, but this requires -D_GLIBCXX_ASSERTIONS
to trigger (not that I've hit it myself) if using libstdc++. libcxx has its own variable to control this, -D_LIBCPP_ENABLE_ASSERTIONS=1
.
I got a build failure from -D_GLIBCXX_ASSERTIONS
(in ntoa.cc
of all places) on the random version of gcc 12 I had lying around, which promptly disappeared after upgrading the compiler and libstdc++ to current versions (libstdc++ 13.1.0-7, gcc 12.3.0-5 as I write this). After upgrading there's no build failure and so far no runtime failure either.
I suspect a quality-of-toolchain issue occurs when these flags are enabled.
This is the build failure, and it's nonsense:
In file included from /usr/include/c++/12/string:40,
from ../include/crucible/ntoa.h:4,
from ntoa.cc:1:
In static member function ?static std::char_traits<char>::char_type* std::char_traits<char>::copy(char_type*, const char_type*, std::size_t)?,
inlined from ?static void std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_S_copy(_CharT*, const _CharT*, size_type) [with _CharT = char; _Traits = std::char_traits<char>; _Alloc = std
inlined from ?std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::_M_replace(size_type, size_type, const _CharT*, size_type) [with _Cha
inlined from ?std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::assign(const _CharT*) [with _CharT = char; _Traits = std::char_traits
inlined from ?std::__cxx11::basic_string<_CharT, _Traits, _Alloc>& std::__cxx11::basic_string<_CharT, _Traits, _Alloc>::operator=(const _CharT*) [with _CharT = char; _Traits = std::char_tra
inlined from ?std::string crucible::bits_ntoa(long long unsigned int, const bits_ntoa_table*)? at ntoa.cc:31:10:
/usr/include/c++/12/bits/char_traits.h:431:56: error: ?void* __builtin_memcpy(void*, const void*, long unsigned int)? accessing 9223372036854775810 or more bytes at offsets [2, 9223372036854775
431 | return static_cast<char_type*>(__builtin_memcpy(__s1, __s2, __n));
| ~~~~~~~~~~~~~~~~^~~~~~~~~~~~~~~~~
The offending line of code was:
out = "0";
where out
is an automatic std::string
variable.
For the past few days I started to observe
beesd
crashing on:I don't have debug symbols installed, so the stacktrace is probably useless, but it looks like this:
I am wondering if this is something I should blame the distro for, or maybe it's been encountered before.
Relevant versions:
Thanks in advance.