segfault in crucible::Fd::operator!

martinetd commented 2 years ago

Hello,

I'm getting segfaults in crucible::Fd::operator! on current master ( fbf6b395c830edd7ade1d0042058e94531769962 ), running with --workaround-btrfs-send as I have snapshots I backup with btrfs send. I didn't try without the workaround, and didn't run bees before on these machines.

Here's bt and bt full with symbols:

(gdb) bt
#0  0x000055820b4b0da3 in crucible::Fd::operator! (this=this@entry=0x7fb8721a7240) at fd.cc:156
#1  0x000055820b49bca0 in operator<< (os=..., bfr=...) at bees-types.cc:80
#2  0x000055820b495b99 in std::function<void (std::ostream&)>::operator()(std::ostream&) const (
    __args#0=..., this=0x7fb8721a7350) at /usr/include/c++/11/bits/std_function.h:590
#3  BeesNote::get_status[abi:cxx11]() () at bees-trace.cc:165
#4  0x000055820b45390a in BeesContext::dump_status (this=0x55820d051a80) at bees-context.cc:88
#5  0x000055820b4a8a33 in std::function<void ()>::operator()() const (this=this@entry=0x7fb86e9a07b0)
    at /usr/include/c++/11/bits/std_function.h:590
#6  crucible::catch_all(std::function<void ()> const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)> const&) (f=..., explainer=...) at error.cc:55
#7  0x000055820b493035 in operator() (__closure=0x55820d051f88) at bees-thread.cc:22
#8  0x00007fb8735c58d3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#9  0x00007fb873715d80 in start_thread (arg=0x7fb86e9a2640) at pthread_create.c:481
#10 0x00007fb8732ae76f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95
(gdb) bt full
#0  0x000055820b4b0da3 in crucible::Fd::operator! (this=this@entry=0x7fb8721a7240) at fd.cc:156
No locals.
#1  0x000055820b49bca0 in operator<< (os=..., bfr=...) at bees-types.cc:80
No locals.
#2  0x000055820b495b99 in std::function<void (std::ostream&)>::operator()(std::ostream&) const (
    __args#0=..., this=0x7fb8721a7350) at /usr/include/c++/11/bits/std_function.h:590
No locals.
#3  BeesNote::get_status[abi:cxx11]() () at bees-trace.cc:165
        oss = <incomplete type>
        t = {first = 1337876, second = 0x7fb8721a7350}
        __for_range = std::map with 15 elements = {[1337869] = 0x7ffc706422c0, 
          [1337874] = 0x7fb8731a7a50, [1337875] = 0x7fb8729a6a50, [1337876] = 0x7fb8721a7350, 
          [1337877] = 0x7fb8719a3ce0, [1337878] = 0x7fb8711a2ce0, [1337879] = 0x7fb8709a3eb0, 
          [1337880] = 0x7fb8701a0ce0, [1337881] = 0x7fb86f9a0a50, [1337882] = 0x7fb86f1a11b0, 
          [1337883] = 0x7fb86e9a0370, [1337884] = 0x7fb86e19f570, [1337885] = 0x7fb86d99dfd0, 
          [1337886] = 0x7fb86d19d520, [1337887] = 0x7fb86c99c520}
Python Exception <class 'gdb.error'> value has been optimized out: 
        __for_begin = 
Python Exception <class 'gdb.error'> value has been optimized out: 
        __for_end = 
        lock = {_M_device = 0x55820b504a20 <BeesNote::s_mutex>, _M_owns = true}
        rv = std::map with 3 elements = {[1337869] = "bees: [11265.6s] waiting for signals", 
          [1337874] = "crawl_263: [24.002s] Extending matching range: BeesRangePair: 2.078M src[0x1000..0x215000] dst[0x64b18000..0x64d2c000]\nsrc = 180 /mnt/fsroot/#2531952 (deleted)\ndst = 36 /mnt/fsroot/home/.snapshots/297"..., 
          [1337875] = "crawl_263: Extending matching range: BeesRangePair: 420K src[0x1000..0x6a000] dst[0x4080000..0x40e9000]\nsrc = 181 /mnt/fsroot/#2531953 (deleted)\ndst = 41 /mnt/fsroot/home/.snapshots/2953/snapshot/mart"...}
#4  0x000055820b45390a in BeesContext::dump_status (this=0x55820d051a80) at bees-context.cc:88
        t = {first = 1855587120, 
          second = <error reading variable: Cannot create a lazy string with address 0x0, and a non-zero length.>}
        __for_range = <optimized out>
Python Exception <class 'gdb.error'> value has been optimized out: 
        __for_begin = 
Python Exception <class 'gdb.error'> value has been optimized out: 
        __for_end = 
        thisStats = {<BeesStatTmpl<unsigned long>> = {m_stats_map = std::map with 158 elements = {
              ["addr_block"] = 8761379, ["addr_compressed_offset"] = 3647795, ["addr_eof_e"] = 255, 
              ["addr_from_fd"] = 8432620, ["addr_from_root_fd"] = 149048, ["addr_hole"] = 700112, 
              ["addr_magic"] = 700112, ["addr_ms"] = 37826324, ["addr_uncompressed"] = 5113584, 
              ["adjust_compressed_offset_correct"] = 118391, ["adjust_eof_fail"] = 1, 
              ["adjust_eof_haystack"] = 11, ["adjust_eof_hit"] = 10, ["adjust_eof_needle"] = 11, 
              ["adjust_exact"] = 17454, ["adjust_exact_correct"] = 17454, ["adjust_hit"] = 135845, 
              ["adjust_offset_high"] = 4395, ["adjust_offset_hit"] = 118391, 
              ["adjust_offset_low"] = 4564, ["adjust_try"] = 144815, ["block_bytes"] = 32366481703, 
              ["block_hash"] = 3637208, ["block_ms"] = 10818226, ["block_read"] = 7902132, 
              ["block_zero"] = 55251, ["chase_corrected"] = 84680, ["chase_hit"] = 135855, 
              ["chase_no_data"] = 8960, ["chase_try"] = 149048, ["chase_uncorrected"] = 51175, 
              ["chase_wrong_addr"] = 4207, ["chase_wrong_magic"] = 26, ["crawl_blacklisted"] = 3402, 
              ["crawl_create"] = 7055, ["crawl_empty"] = 1483, ["crawl_fail"] = 4, 
              ["crawl_gen_high"] = 7650, ["crawl_gen_low"] = 373002, ["crawl_hole"] = 823, 
              ["crawl_inline"] = 73, ["crawl_items"] = 514441, ["crawl_ms"] = 22816, 
              ["crawl_nondata"] = 96615, ["crawl_prealloc"] = 121, ["crawl_push"] = 32876, 
              ["crawl_restart"] = 1442, ["crawl_scan"] = 32873, ["crawl_search"] = 134291, 
              ["dedup_bytes"] = 678174262, ["dedup_copy"] = 1198063616, ["dedup_hit"] = 8797, 
              ["dedup_ms"] = 487204, ["dedup_try"] = 8797, ["dedup_workaround_btrfs_send"] = 126953, 
              ["exception_caught"] = 2, ["exception_caught_silent"] = 97, ["hash_already"] = 329, 
              ["hash_bump"] = 276, ["hash_erase"] = 1153, ["hash_erase_miss"] = 1910, 
--Type <RET> for more, q to quit, c to continue without paging--
              ["hash_evict"] = 751737, ["hash_extent_in"] = 1, ["hash_extent_out"] = 32714, 
              ["hash_front"] = 3723, ["hash_front_already"] = 755, ["hash_insert"] = 753160, 
              ["hash_lookup"] = 3421612, ["inserted_block"] = 541973, ["inserted_clobbered"] = 1904, 
              ["matched_0"] = 652482, ["matched_1_or_more"] = 4578, ["matched_2_or_more"] = 117, 
              ["matched_3_or_more"] = 48, ["open_clear"] = 44, ["open_file"] = 1971, 
              ["open_hit"] = 1933, ["open_ino_ms"] = 13482, ["open_lookup_enoent"] = 36, 
              ["open_lookup_ok"] = 1971, ["open_no_path"] = 38, ["open_root_ms"] = 30304, 
              ["open_wrong_flags"] = 38, ["pairbackward_bof_first"] = 183, 
              ["pairbackward_hit"] = 4755, ["pairbackward_miss"] = 93885, 
              ["pairbackward_ms"] = 2465404, ["pairbackward_same"] = 1, 
              ["pairbackward_stop"] = 135756, ["pairbackward_try"] = 98824, 
              ["pairforward_eof_first"] = 87379, ["pairforward_hit"] = 2759797, 
              ["pairforward_miss"] = 3169, ["pairforward_ms"] = 73038123, 
              ["pairforward_overlap"] = 34, ["pairforward_same"] = 3, ["pairforward_stop"] = 135750, 
              ["pairforward_try"] = 2853940, ["pairforward_zero"] = 3552, 
              ["readahead_ms"] = 292290...}, m_mutex = {<std::__mutex_base> = {_M_mutex = {__data = {
                    __lock = 0, __count = 0, __owner = 0, __nusers = 0, __kind = 0, __spins = 0, 
                    __elision = 0, __list = {__prev = 0x0, __next = 0x0}}, 
                  __size = '\000' <repeats 39 times>, __align = 0}}, <No data fields>}}, 
          static s_global = {<BeesStatTmpl<unsigned long>> = {
              m_stats_map = std::map with 158 elements = {["addr_block"] = 8761389, 
                ["addr_compressed_offset"] = 3647798, ["addr_eof_e"] = 255, 
                ["addr_from_fd"] = 8432631, ["addr_from_root_fd"] = 149048, ["addr_hole"] = 700113, 
                ["addr_magic"] = 700113, ["addr_ms"] = 37826524, ["addr_uncompressed"] = 5113591, 
                ["adjust_compressed_offset_correct"] = 118391, ["adjust_eof_fail"] = 1, 
                ["adjust_eof_haystack"] = 11, ["adjust_eof_hit"] = 10, ["adjust_eof_needle"] = 11, 
                ["adjust_exact"] = 17454, ["adjust_exact_correct"] = 17454, ["adjust_hit"] = 135845, 
                ["adjust_offset_high"] = 4395, ["adjust_offset_hit"] = 118391, 
                ["adjust_offset_low"] = 4564, ["adjust_try"] = 144815, ["block_bytes"] = 32366493991, 
                ["block_hash"] = 3637212, ["block_ms"] = 10818226, ["block_read"] = 7902135, 
                ["block_zero"] = 55251, ["chase_corrected"] = 84680, ["chase_hit"] = 135855, 
                ["chase_no_data"] = 8960, ["chase_try"] = 149048, ["chase_uncorrected"] = 51175, 
                ["chase_wrong_addr"] = 4207, ["chase_wrong_magic"] = 26, 
                ["crawl_blacklisted"] = 3402, ["crawl_create"] = 7055, ["crawl_empty"] = 1483, 
                ["crawl_fail"] = 4, ["crawl_gen_high"] = 7650, ["crawl_gen_low"] = 373006, 
                ["crawl_hole"] = 823, ["crawl_inline"] = 73, ["crawl_items"] = 514445, 
                ["crawl_ms"] = 22816, ["crawl_nondata"] = 96615, ["crawl_prealloc"] = 121, 
                ["crawl_push"] = 32876, ["crawl_restart"] = 1442, ["crawl_scan"] = 32873, 
                ["crawl_search"] = 134291, ["dedup_bytes"] = 678174262, ["dedup_copy"] = 1198063616, 
                ["dedup_hit"] = 8797, ["dedup_ms"] = 487204, ["dedup_try"] = 8797, 
                ["dedup_workaround_btrfs_send"] = 126953, ["exception_caught"] = 2, 
                ["exception_caught_silent"] = 97, ["hash_already"] = 329, ["hash_bump"] = 276, 
                ["hash_erase"] = 1153, ["hash_erase_miss"] = 1910, ["hash_evict"] = 751737, 
                ["hash_extent_in"] = 1, ["hash_extent_out"] = 32714, ["hash_front"] = 3723, 
                ["hash_front_already"] = 755, ["hash_insert"] = 753160, ["hash_lookup"] = 3421616, 
                ["inserted_block"] = 541973, ["inserted_clobbered"] = 1904, ["matched_0"] = 652482, 
                ["matched_1_or_more"] = 4578, ["matched_2_or_more"] = 117, 
                ["matched_3_or_more"] = 48, ["open_clear"] = 44, ["open_file"] = 1971, 
                ["open_hit"] = 1933, ["open_ino_ms"] = 13482, ["open_lookup_enoent"] = 36, 
                ["open_lookup_ok"] = 1971, ["open_no_path"] = 38, ["open_root_ms"] = 30304, 
                ["open_wrong_flags"] = 38, ["pairbackward_bof_first"] = 183, 
                ["pairbackward_hit"] = 4755, ["pairbackward_miss"] = 93885, 
                ["pairbackward_ms"] = 2465404, ["pairbackward_same"] = 1, 
                ["pairbackward_stop"] = 135756, ["pairbackward_try"] = 98824, 
                ["pairforward_eof_first"] = 87379, ["pairforward_hit"] = 2759801, 
--Type <RET> for more, q to quit, c to continue without paging--
                ["pairforward_miss"] = 3169, ["pairforward_ms"] = 73038123, 
                ["pairforward_overlap"] = 34, ["pairforward_same"] = 3, 
                ["pairforward_stop"] = 135750, ["pairforward_try"] = 2853944, 
                ["pairforward_zero"] = 3552, ["readahead_ms"] = 292290...}, 
              m_mutex = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 0, __count = 0, 
                      __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {
                        __prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, 
                    __align = 0}}, <No data fields>}}, 
            static s_global = <same as static member of an already seen type>}}
        beesNote_105 = {
          m_func = {<std::_Maybe_unary_or_binary_function<void, std::basic_ostream<char, std::char_traits<char> >&>> = {<std::unary_function<std::basic_ostream<char, std::char_traits<char> >&, void>> = {<No data fields>}, <No data fields>}, <std::_Function_base> = {static _M_max_size = 16, 
              static _M_max_align = 8, _M_functor = {_M_unused = {_M_object = 0x0, 
                  _M_const_object = 0x0, _M_function_pointer = 0x0, _M_member_pointer = NULL}, 
                _M_pod_data = '\000' <repeats 15 times>}, _M_manager = 0x7fb8400085c0}, 
            _M_invoker = 0x7fb840008570}, m_prev = 0x7fb840008610, m_timer = {m_start = {__d = {
                __r = 3}}}, m_name = "status_report"}
        beesNote_77 = {
          m_func = {<std::_Maybe_unary_or_binary_function<void, std::basic_ostream<char, std::char_traits<char> >&>> = {<std::unary_function<std::basic_ostream<char, std::char_traits<char> >&, void>> = {<No data fields>}, <No data fields>}, <std::_Function_base> = {static _M_max_size = 16, 
              static _M_max_align = 8, _M_functor = {_M_unused = {_M_object = 0x7fb86e9a0330, 
                  _M_const_object = 0x7fb86e9a0330, _M_function_pointer = 0x7fb86e9a0330, 
                  _M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x7fb86e9a0330}, _M_pod_data = "0\003\232n\270\177\000\000\000\000\000\000\000\000\000"}, 
              _M_manager = 0x55820b449370 <std::_Function_handler<void(std::basic_ostream<char, std::char_traits<char> >&), BeesContext::dump_status()::<lambda(std::ostream&)> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation)>}, 
            _M_invoker = 0x55820b44b2e0 <std::_Function_handler<void(std::basic_ostream<char, std::char_traits<char> >&), BeesContext::dump_status()::<lambda(std::ostream&)> >::_M_invoke(const std::_Any_data &, std::basic_ostream<char, std::char_traits<char> > &)>}, m_prev = 0x7fb86e9a07d0, m_timer = {
            m_start = {__d = {__r = 1649312502512418448}}}, m_name = "status_report"}
        ofs = <incomplete type>
        avg_rates = {m_stats_map = std::map with 158 elements = {["addr_block"] = 777.71199999999999, 
            ["addr_compressed_offset"] = 323.80000000000001, ["addr_eof_e"] = 0.023, 
            ["addr_from_fd"] = 748.529, ["addr_from_root_fd"] = 13.231, 
            ["addr_hole"] = 62.146999999999998, ["addr_magic"] = 62.146999999999998, 
            ["addr_ms"] = 3357.6869999999999, ["addr_uncompressed"] = 453.91199999999998, 
            ["adjust_compressed_offset_correct"] = 10.51, ["adjust_eof_fail"] = 0.001, 
            ["adjust_eof_haystack"] = 0.001, ["adjust_eof_hit"] = 0.001, 
            ["adjust_eof_needle"] = 0.001, ["adjust_exact"] = 1.55, ["adjust_exact_correct"] = 1.55, 
            ["adjust_hit"] = 12.058999999999999, ["adjust_offset_high"] = 0.39100000000000001, 
            ["adjust_offset_hit"] = 10.51, ["adjust_offset_low"] = 0.40600000000000003, 
            ["adjust_try"] = 12.855, ["block_bytes"] = 2873038.7519999999, 
            ["block_hash"] = 322.86000000000001, ["block_ms"] = 960.28999999999996, 
            ["block_read"] = 701.44000000000005, ["block_zero"] = 4.9050000000000002, 
            ["chase_corrected"] = 7.5170000000000003, ["chase_hit"] = 12.06, 
            ["chase_no_data"] = 0.79600000000000004, ["chase_try"] = 13.231, 
            ["chase_uncorrected"] = 4.5430000000000001, ["chase_wrong_addr"] = 0.374, 
            ["chase_wrong_magic"] = 0.0030000000000000001, 
            ["crawl_blacklisted"] = 0.30199999999999999, ["crawl_create"] = 0.627, 
            ["crawl_empty"] = 0.13200000000000001, ["crawl_fail"] = 0.001, 
            ["crawl_gen_high"] = 0.68000000000000005, ["crawl_gen_low"] = 33.109999999999999, 
            ["crawl_hole"] = 0.073999999999999996, ["crawl_inline"] = 0.0070000000000000001, 
--Type <RET> for more, q to quit, c to continue without paging--
            ["crawl_items"] = 45.664999999999999, ["crawl_ms"] = 2.0259999999999998, 
            ["crawl_nondata"] = 8.577, ["crawl_prealloc"] = 0.010999999999999999, 
            ["crawl_push"] = 2.919, ["crawl_restart"] = 0.129, ["crawl_scan"] = 2.919, 
            ["crawl_search"] = 11.920999999999999, ["dedup_bytes"] = 60198.726000000002, 
            ["dedup_copy"] = 106347.15399999999, ["dedup_hit"] = 0.78100000000000003, 
            ["dedup_ms"] = 43.247999999999998, ["dedup_try"] = 0.78100000000000003, 
            ["dedup_workaround_btrfs_send"] = 11.27, ["exception_caught"] = 0.001, 
            ["exception_caught_silent"] = 0.0089999999999999993, 
            ["hash_already"] = 0.029999999999999999, ["hash_bump"] = 0.025000000000000001, 
            ["hash_erase"] = 0.10299999999999999, ["hash_erase_miss"] = 0.17000000000000001, 
            ["hash_evict"] = 66.728999999999999, ["hash_extent_in"] = 0.001, 
            ["hash_extent_out"] = 2.9039999999999999, ["hash_front"] = 0.33100000000000002, 
            ["hash_front_already"] = 0.068000000000000005, ["hash_insert"] = 66.855000000000004, 
            ["hash_lookup"] = 303.72300000000001, ["inserted_block"] = 48.109000000000002, 
            ["inserted_clobbered"] = 0.17000000000000001, ["matched_0"] = 57.918999999999997, 
            ["matched_1_or_more"] = 0.40699999999999997, 
            ["matched_2_or_more"] = 0.010999999999999999, 
            ["matched_3_or_more"] = 0.0050000000000000001, ["open_clear"] = 0.0040000000000000001, 
            ["open_file"] = 0.17499999999999999, ["open_hit"] = 0.17199999999999999, 
            ["open_ino_ms"] = 1.1970000000000001, ["open_lookup_enoent"] = 0.0040000000000000001, 
            ["open_lookup_ok"] = 0.17499999999999999, ["open_no_path"] = 0.0040000000000000001, 
            ["open_root_ms"] = 2.6899999999999999, ["open_wrong_flags"] = 0.0040000000000000001, 
            ["pairbackward_bof_first"] = 0.017000000000000001, 
            ["pairbackward_hit"] = 0.42299999999999999, ["pairbackward_miss"] = 8.3339999999999996, 
            ["pairbackward_ms"] = 218.84399999999999, ["pairbackward_same"] = 0.001, 
            ["pairbackward_stop"] = 12.051, ["pairbackward_try"] = 8.7729999999999997, 
            ["pairforward_eof_first"] = 7.7569999999999997, ["pairforward_hit"] = 244.976, 
            ["pairforward_miss"] = 0.28199999999999997, ["pairforward_ms"] = 6483.2929999999997, 
            ["pairforward_overlap"] = 0.0040000000000000001, ["pairforward_same"] = 0.001, 
            ["pairforward_stop"] = 12.050000000000001, ["pairforward_try"] = 253.333, 
            ["pairforward_zero"] = 0.316, ["readahead_ms"] = 25.946000000000002...}, 
          m_mutex = {<std::__mutex_base> = {_M_mutex = {__data = {__lock = 0, __count = 0, 
                  __owner = 0, __nusers = 0, __kind = 0, __spins = 0, __elision = 0, __list = {
                    __prev = 0x0, __next = 0x0}}, __size = '\000' <repeats 39 times>, 
                __align = 0}}, <No data fields>}}
        beesNote_102 = {
          m_func = {<std::_Maybe_unary_or_binary_function<void, std::basic_ostream<char, std::char_traits<char> >&>> = {<std::unary_function<std::basic_ostream<char, std::char_traits<char> >&, void>> = {<No data fields>}, <No data fields>}, <std::_Function_base> = {static _M_max_size = 16, 
              static _M_max_align = 8, _M_functor = {_M_unused = {_M_object = 0x7fb86e9a0330, 
                  _M_const_object = 0x7fb86e9a0330, _M_function_pointer = 0x7fb86e9a0330, 
                  _M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x7fb86e9a0330}, _M_pod_data = "0\003\232n\270\177\000\000\000\000\000\000\000\000\000"}, 
              _M_manager = 0x55820b4493b0 <std::_Function_handler<void(std::basic_ostream<char, std::char_traits<char> >&), BeesContext::dump_status()::<lambda(std::ostream&)> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation)>}, 
            _M_invoker = 0x55820b44b330 <std::_Function_handler<void(std::basic_ostream<char, std::char_traits<char> >&), BeesContext::dump_status()::<lambda(std::ostream&)> >::_M_invoke(const std::_Any_data &, std::basic_ostream<char, std::char_traits<char> > &)>}, m_prev = 0x7fb86e9a0370, m_timer = {
            m_start = {__d = {__r = 1649312501512235562}}}, m_name = "status_report"}
        lock = {_M_device = <optimized out>, _M_owns = <optimized out>}
        status_charp = <optimized out>
        status_file = "/run/bees"
        total_timer = {m_start = {__d = {__r = 1649301236920756043}}}
#5  0x000055820b4a8a33 in std::function<void ()>::operator()() const (this=this@entry=0x7fb86e9a07b0)
--Type <RET> for more, q to quit, c to continue without paging--
    at /usr/include/c++/11/bits/std_function.h:590
No locals.
#6  crucible::catch_all(std::function<void ()> const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)> const&) (f=..., explainer=...) at error.cc:55
No locals.
#7  0x000055820b493035 in operator() (__closure=0x55820d051f88) at bees-thread.cc:22
        beesNote_20 = {
          m_func = {<std::_Maybe_unary_or_binary_function<void, std::basic_ostream<char, std::char_traits<char> >&>> = {<std::unary_function<std::basic_ostream<char, std::char_traits<char> >&, void>> = {<No data fields>}, <No data fields>}, <std::_Function_base> = {static _M_max_size = 16, 
              static _M_max_align = 8, _M_functor = {_M_unused = {_M_object = 0x0, 
                  _M_const_object = 0x0, _M_function_pointer = 0x0, _M_member_pointer = NULL}, 
                _M_pod_data = '\000' <repeats 15 times>}, 
              _M_manager = 0x55820b492da0 <std::_Function_handler<void(std::basic_ostream<char, std::char_traits<char> >&), BeesThread::exec(std::function<void()>)::<lambda()>::<lambda(std::ostream&)> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation)>}, 
            _M_invoker = 0x55820b492e90 <std::_Function_handler<void(std::basic_ostream<char, std::char_traits<char> >&), BeesThread::exec(std::function<void()>)::<lambda()>::<lambda(std::ostream&)> >::_M_invoke(const std::_Any_data &, std::basic_ostream<char, std::char_traits<char> > &)>}, m_prev = 0x0, 
          m_timer = {m_start = {__d = {__r = 1649301236920742915}}}, m_name = "status_report"}
        thread_time = {m_start = {__d = {__r = 1649301236920744037}}}
        this = 0x55820d053ed0
        func = {<std::_Maybe_unary_or_binary_function<void>> = {<No data fields>}, <std::_Function_base> = {static _M_max_size = 16, static _M_max_align = 8, _M_functor = {_M_unused = {
                _M_object = 0x55820d051a80, _M_const_object = 0x55820d051a80, 
                _M_function_pointer = 0x55820d051a80, 
                _M_member_pointer = (void (std::_Undefined_class::*)(std::_Undefined_class * const)) 0x55820d051a80}, _M_pod_data = "\200\032\005\r\202U\000\000\000\000\000\000\000\000\000"}, 
            _M_manager = 0x55820b449cb0 <std::_Function_handler<void(), BeesContext::start()::<lambda()> >::_M_manager(std::_Any_data &, const std::_Any_data &, std::_Manager_operation)>}, 
          _M_invoker = 0x55820b454230 <std::_Function_handler<void(), BeesContext::start()::<lambda()> >::_M_invoke(const std::_Any_data &)>}
#8  0x00007fb8735c58d3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
No symbol table info available.
#9  0x00007fb873715d80 in start_thread (arg=0x7fb86e9a2640) at pthread_create.c:481
        ret = <optimized out>
        pd = 0x7fb86e9a2640
        unwind_buf = {cancel_jmp_buf = {{jmp_buf = {140430106306112, 301551294128754987, 
                140722194096142, 140722194096143, 0, 140430106306112, -333391192826146517, 
                -333348082803371733}, mask_was_saved = 0}}, priv = {pad = {0x0, 0x0, 0x0, 0x0}, 
            data = {prev = 0x0, cleanup = 0x0, canceltype = 0}}}
        not_first_call = 0
#10 0x00007fb8732ae76f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Here's the last few lines of logs for this precise occurrence (verbose 8), doesn't look very useful to me:

Apr 07 15:21:40 xxx sh[1337869]: 2022-04-07 15:21:40 1337869.1337879<7> crawl_master: WORKAROUND: Avoiding RO root 19015
Apr 07 15:21:40 xxx sh[1337869]: 2022-04-07 15:21:40 1337869.1337879<7> crawl_master: WORKAROUND: skipping scan of RO root 19015
Apr 07 15:21:40 xxx sh[1337869]: 2022-04-07 15:21:40 1337869.1337879<7> crawl_master: WORKAROUND: skipping scan of RO root 19016
Apr 07 15:21:40 xxx sh[1337869]: 2022-04-07 15:21:40 1337869.1337879<7> crawl_master: WORKAROUND: Avoiding RO root 19017
Apr 07 15:21:40 xxx sh[1337869]: 2022-04-07 15:21:40 1337869.1337879<7> crawl_master: WORKAROUND: skipping scan of RO root 19017
Apr 07 15:21:40 xxx sh[1337869]: 2022-04-07 15:21:40 1337869.1337879<7> crawl_master: WORKAROUND: Avoiding RO root 19018
Apr 07 15:21:40 xxx sh[1337869]: 2022-04-07 15:21:40 1337869.1337879<7> crawl_master: WORKAROUND: skipping scan of RO root 19018
Apr 07 15:21:42 xxx sh[1337869]: 2022-04-07 15:21:42 1337869.1337876<4> crawl_262: Opening /mnt/fsroot/var/log/journal/c79bb415ebf74ebab9b916dd09774d10/system.journal found FS_NOCOW_FL flag in 0x800000
Apr 07 15:21:44 xxx systemd[1]: bees.service: Main process exited, code=killed, status=11/SEGV
Apr 07 15:21:44 xxx systemd[1]: bees.service: Failed with result 'signal'.
Apr 07 15:21:44 xxx systemd[1]: bees.service: Consumed 37min 17.296s CPU time.

I didn't investigate further, but can if you'd like me to; bees seems to crash after at most a few hours so it shouldn't be too much trouble to add extra logs or anything if you want more infos before fixing (just adding a null pointer check is probably trivial enough)

Thanks!

Zygo commented 2 years ago

operator!() is a null pointer check, so it won't be that simple.

It's probably a BEESNOTE macro that references a Fd object which is assigned while the BEESNOTE macro is holding it and keeping it visible to the status thread. e.g.

thread 1:

  Fd foo;
  BEESNOTE("Doing a thing with " << foo);  // saves a functor that holds a ref to 'foo'
  foo = do_some_stuff();  // modifies foo

thread 2 is the bees status thread, which calls BEESNOTE's functor. BEESNOTE's status object creation and visibility to the status thread is protected by a mutex, but access to the shared_ptr in the assignment to foo is not. If the status thread invokes the function at the wrong time, operator!() will race with the assignment to foo and cause a crash. This bug existed in several places in bees so far, and it's possible I haven't found all of them yet.

The functor is in the stack trace:

#2  0x000055820b495b99 in std::function<void (std::ostream&)>::operator()(std::ostream&) const (
    __args#0=..., this=0x7fb8721a7350) at /usr/include/c++/11/bits/std_function.h:590

Unfortunately gdb is giving us the very unhelpful definition of the std::function template, but we need to know the point in the code where that template is instantiated. If you can convince gdb to tell us which BEESNOTE instance this belongs to, it should be an easy fix.

martinetd commented 2 years ago

operator!() is a null pointer check, so it won't be that simple.

It's a null pointer check for an attribute of m_handle, not for m_handle itself -- I've been running with this patch for a few days:

diff --git a/lib/fd.cc b/lib/fd.cc
index 2f62567afdbc..f546d9bbcdb1 100644
--- a/lib/fd.cc
+++ b/lib/fd.cc
@@ -153,7 +153,7 @@ namespace crucible {
        bool
        Fd::operator!() const
        {
-               return m_handle->get_fd() < 0;
+               return !m_handle || m_handle->get_fd() < 0;
        }

        shared_ptr<IOHandle>

Unfortunately gdb is giving us the very unhelpful definition of the std::function template, but we need to know the point in the code where that template is instantiated. If you can convince gdb to tell us which BEESNOTE instance this belongs to, it should be an easy fix.

I don't have time right now, I'll have a look early next week -- thanks for the pointer!

Zygo commented 2 years ago

That patch probably does make it much harder to lose the race and hit the bug, but the real fix is to find where the invalid access is. "m_handle is not null" is an object invariant so it should never be NULL. Good hunting!

martinetd commented 2 years ago

Took a bit longer than planned, sorry for the delay.

So, from what I can see, there's another thread calling the destructor of that BeesNote get_status() is iterating on when the crash happens:

(gdb) bt
#0  futex_wait (private=0, expected=2, futex_word=0x55820b504a20 <BeesNote::s_mutex>)
    at ../sysdeps/nptl/futex-internal.h:146
#1  __lll_lock_wait (futex=futex@entry=0x55820b504a20 <BeesNote::s_mutex>, private=0)
    at lowlevellock.c:52
#2  0x00007fb873717fe3 in __GI___pthread_mutex_lock (mutex=0x55820b504a20 <BeesNote::s_mutex>)
    at ../nptl/pthread_mutex_lock.c:80
#3  0x000055820b494bd6 in __gthread_mutex_lock (__mutex=0x55820b504a20 <BeesNote::s_mutex>)
    at /usr/include/x86_64-linux-gnu/c++/11/bits/gthr-default.h:749
#4  std::mutex::lock (this=0x55820b504a20 <BeesNote::s_mutex>)
    at /usr/include/c++/11/bits/std_mutex.h:100
#5  std::unique_lock<std::mutex>::lock (this=<synthetic pointer>)
    at /usr/include/c++/11/bits/unique_lock.h:139
#6  std::unique_lock<std::mutex>::unique_lock (__m=..., this=<synthetic pointer>)
    at /usr/include/c++/11/bits/unique_lock.h:69
#7  BeesNote::~BeesNote (this=this@entry=0x7fb8721a7350, __in_chrg=<optimized out>)
    at bees-trace.cc:92
#8  0x000055820b45aac2 in BeesContext::scan_forward (this=<optimized out>, bfr_in=...)
    at bees-context.cc:795
#9  0x000055820b47f45a in operator() (__closure=<optimized out>)
    at /usr/include/c++/11/bits/shared_ptr_base.h:1295
#10 std::__invoke_impl<void, BeesRoots::crawl_batch(std::shared_ptr<BeesCrawl>)::<lambda()>&> (
    __f=...) at /usr/include/c++/11/bits/invoke.h:61
#11 std::__invoke_r<void, BeesRoots::crawl_batch(std::shared_ptr<BeesCrawl>)::<lambda()>&> (__fn=...)
    at /usr/include/c++/11/bits/invoke.h:154
#12 std::_Function_handler<void(), BeesRoots::crawl_batch(std::shared_ptr<BeesCrawl>)::<lambda()> >::_M_invoke(const std::_Any_data &) (__functor=...) at /usr/include/c++/11/bits/std_function.h:290
#13 0x000055820b4a8a33 in std::function<void ()>::operator()() const (this=this@entry=0x7fb8721a76f0)
    at /usr/include/c++/11/bits/std_function.h:590
#14 crucible::catch_all(std::function<void ()> const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)> const&) (f=..., explainer=...) at error.cc:55
#15 0x000055820b4c64cf in crucible::TaskState::exec (this=0x7fb8500bbf80) at task.cc:305
#16 0x000055820b4a8a33 in std::function<void ()>::operator()() const (this=this@entry=0x7fb8721a7980)
    at /usr/include/c++/11/bits/std_function.h:590
#17 crucible::catch_all(std::function<void ()> const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)> const&) (f=..., explainer=...) at error.cc:55
#18 0x000055820b4c6c91 in crucible::TaskConsumer::consumer_thread (this=0x55820d052bd0) at task.cc:739
#19 0x00007fb8735c58d3 in ?? () from /lib/x86_64-linux-gnu/libstdc++.so.6
#20 0x00007fb873715d80 in start_thread (arg=0x7fb8721a9640) at pthread_create.c:481
#21 0x00007fb8732ae76f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

It's waiting on BeesNote::s_mutex to remove itself from BeesNote::s_status (it's the same dump as the original post, you can see 0x7fb8721a7350 as parameter of the std::function)

This happens at the end of BeesContext::scan_forward so it's just normal destructors happening at the end of the function; the problem would be that it decided to free/uninit the m_handle before removing the note from the list here.

(we can also see scan_forward in std::function scope, the function called is this whatever that means:

(gdb) p _M_invoker
$5 = (std::function<void(std::basic_ostream<char, std::char_traits<char> >&)>::_Invoker_type) 0x55820b449090 <std::_Function_handler<void(std::basic_ostream<char, std::char_traits<char> >&), BeesContext::scan_forward(const BeesFileRange&)::<lambda(std::ostream&)> >::_M_invoke(const std::_Any_data &, std::basic_ostream<char, std::char_traits<char> > &)>

anyway, it's just belatedly processing the << statements, and << does !!bfr.m_fd so calls ! and this explodes)

Unfortunately this all doesn't tell us -which- note it is, there are a couple of << bfr* notes in scan_forward which could match and I couldn't print the text itself, but either way the bfr are not explicitly freed before the end of the function so we're still in an ordering problem: for some reason the BeesFileRange m_fd got cleaned up before the note?

Looking at scan_forward disass output, I see they call delete on what I think is bfr_in just before calling the note destructor so that'd be a nifty candidate, but I'll admit I was too lazy to follow this properly so might have made a mistake.

Anyway, I honestly have no idea how to do enforce ordering (I'd have assumed c++ would be smart enough to notice we use the bfr in the note so to deallocate in opposite order...), if the bfd has actually been detroyed the memory is up for grabs and my "fix" is as you say just as problematic (use after free).

Not quite sure how to proceed here, any idea?

Zygo / bees

segfault in crucible::Fd::operator! #218