OFS / opae-sdk

Open Programmable Acceleration Engine
https://ofs.github.io
BSD 3-Clause "New" or "Revised" License
251 stars 84 forks source link

spdlog: do not drop logger in destructor #3116

Closed pcolberg closed 4 months ago

pcolberg commented 5 months ago

For static objects, the order of destruction is in reverse of the order of construction, but the latter is hard to control except among static objects defined within the same translation unit. This means that spdlog may be destructed before spdlog::drop() is invoked in the destructor, which leads to a segmentation fault due to memory use-after-free.

Move call of spdlog::drop() from destructor to main() function, to ensure the logger may be re-registered with the same name in tests. Use Scope Guard pattern to implement RAII for logger registration.

Link: https://github.com/ros2/rclcpp/issues/933 Link: https://github.com/gabime/spdlog/issues/1738 Link: https://github.com/gabime/spdlog/issues/2113 Link: https://en.wikibooks.org/wiki/More_C++_Idioms/Scope_Guard Closes: https://hsdes.intel.com/appstore/article/#/22019839238

Cc: @badanur

pcolberg commented 5 months ago

Stack trace of hssi with opae-sdk 2.12.0-5 on RHEL 8.6:

(gdb) r --pci-address 0000:b1:00.6 hssi_100g --num-packets 100 --port 0
Starting program: /usr/bin/hssi --pci-address 0000:b1:00.6 hssi_100g --num-packets 100 --port 0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
[New Thread 0x7ffff33ff700 (LWP 804376)]
100G loopback test
  port: 0
  eth_loopback: on
  num_packets: 100
  gap: none
  src_address: 11:22:33:44:55:66
    (bits): 0x665544332211
  dest_address: 77:88:99:aa:bb:cc
    (bits): 0xccbbaa998877
  pattern: random
  start size: 40
  end size: 2580
  end select: pkt_num
  eth: 
  continuous mode: off
  monitor duration: 0 sec

No eth interface, so not honoring --eth-loopback.
0x40000           ETH_AFU_DFH: 0x1000010000001000
0x40008          ETH_AFU_ID_L: 0xb03abd8d4a533812
0x40010          ETH_AFU_ID_H: 0x43425ee692b24742
0x40030      TRAFFIC_CTRL_CMD: 0x0000000000000000
0x40038     TRAFFIC_CTRL_DATA: 0x0000004000000064
0x40040 TRAFFIC_CTRL_PORT_SEL: 0x0000000000000000
0x40048        AFU_SCRATCHPAD: 0x0000000045324511

0x1000                scratch: 0x00000000
0x1001               block_ID: 0x434c4e54
0x1008               pkt_size: 0x25800040
0x1009                  ctrl0: 0x80000064
0x1010                  ctrl1: 0x00000040
0x1011            dst_addr_lo: 0xaa998877
0x1012            dst_addr_hi: 0x0000ccbb
0x1013            src_addr_lo: 0x44332211
0x1014            src_addr_hi: 0x00006655
0x1015               rx_count: 0x00000000
0x1016                mlb_rst: 0x00000000
0x1017               tx_count: 0x00000064

No eth interface, so not showing stats.
[Thread 0x7ffff33ff700 (LWP 804376) exited]

Thread 1 "hssi" received signal SIGSEGV, Segmentation fault.
std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<spdlog::logger> >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<spdlog::logger> > >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_before_node (
    this=this@entry=0x7ffff5e7f118 <spdlog::details::registry::instance()::s_instance+120>, __n=__n@entry=0, 
    __k="hssi_100g", __code=1051062443682879639) at /usr/include/c++/8/bits/hashtable.h:1565
warning: Source file is more recent than executable.
1565         if (!__p->_M_nxt || _M_bucket_index(__p->_M_next()) != __n)
Missing separate debuginfos, use: yum debuginfo-install libgcc-8.5.0-20.el8.x86_64 libstdc++-8.5.0-20.el8.x86_64 libuuid-2.32.1-43.el8.x86_64
(gdb) bt
#0  std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<spdlog::logger> >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<spdlog::logger> > >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_find_before_node (
    this=this@entry=0x7ffff5e7f118 <spdlog::details::registry::instance()::s_instance+120>, __n=__n@entry=0, 
    __k="hssi_100g", __code=1051062443682879639) at /usr/include/c++/8/bits/hashtable.h:1565
#1  0x00007ffff5c383af in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<spdlog::logger> >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<spdlog::logger> > >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::_M_erase (
    this=this@entry=0x7ffff5e7f118 <spdlog::details::registry::instance()::s_instance+120>, __k="hssi_100g")
    at /usr/include/c++/8/bits/hashtable_policy.h:1398
#2  0x00007ffff5c384a3 in std::_Hashtable<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<spdlog::logger> >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<spdlog::logger> > >, std::__detail::_Select1st, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<true, false, true> >::erase (__k="hssi_100g", 
    this=0x7ffff5e7f118 <spdlog::details::registry::instance()::s_instance+120>)
    at /usr/include/c++/8/bits/hashtable.h:765
#3  std::unordered_map<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::shared_ptr<spdlog::logger>, std::hash<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::equal_to<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > >, std::allocator<std::pair<std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const, std::shared_ptr<spdlog::logger> > > >::erase (__x="hssi_100g", this=0x7ffff5e7f118 <spdlog::details::registry::instance()::s_instance+120>)
    at /usr/include/c++/8/bits/unordered_map.h:815
#4  spdlog::details::registry::drop (this=0x7ffff5e7f0a0 <spdlog::details::registry::instance()::s_instance>, 
    logger_name="hssi_100g") at ../include/spdlog/details/registry-inl.h:222
#5  0x0000555555427163 in opae::afu_test::afu::~afu (this=0x5555556820a0 <app>, __in_chrg=<optimized out>)
    at /usr/include/c++/8/bits/shared_ptr_base.h:1010
#6  0x00007ffff4c511ec in __run_exit_handlers (status=0, listp=0x7ffff4fbf758 <__exit_funcs>, 
    run_list_atexit=run_list_atexit@entry=true, run_dtors=run_dtors@entry=true) at exit.c:108
#7  0x00007ffff4c51320 in __GI_exit (status=<optimized out>) at exit.c:139
#8  0x00007ffff4c3acaa in __libc_start_main (main=0x555555411d00 <main(int, char**)>, argc=8, argv=0x7fffffffe478, 
    init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe468)
    at ../csu/libc-start.c:348
#9  0x0000555555411f8e in _start () at /usr/src/debug/opae-2.12.0-5.el8.x86_64/samples/hssi/hssi_afu.h:56
pcolberg commented 5 months ago

test_dummy_afu fails due to duplicate logger:

[ RUN      ] dummy_afu/dummy_afu_p.main_invalid_guid/0
unknown file: Failure
C++ exception with description "logger with name 'mmio' already exists" thrown in the test body.
[  FAILED  ] dummy_afu/dummy_afu_p.main_invalid_guid/0, where GetParam() = "dfl-d5005" (132 ms)
coveralls commented 5 months ago

Pull Request Test Coverage Report for Build 8606807048

Details


Changes Missing Coverage Covered Lines Changed/Added Lines %
samples/cxl_hello_fpga/he_cache_test.h 0 6 0.0%
samples/cxl_host_exerciser/he_cache_test.h 0 6 0.0%
<!-- Total: 6 18 33.33% -->
Totals Coverage Status
Change from base Build 8147633845: -0.02%
Covered Lines: 15823
Relevant Lines: 24466

💛 - Coveralls
pcolberg commented 5 months ago

Thankfully the compiler stopped me from resorting to a non-RAII solution using goto:

[2/20] Building CXX object samples/cxl_host_exerciser/CMakeFiles/cxl_host_exerciser.dir/cxl_host_exerciser.cpp.o
FAILED: samples/cxl_host_exerciser/CMakeFiles/cxl_host_exerciser.dir/cxl_host_exerciser.cpp.o 
/usr/bin/c++ -DFMT_SHARED -DHAVE_CONFIG_H=1 -DLIBOPAE_DEBUG=1 -DSPDLOG_SHARED_LIB -DSPDLOG_COMPILED_LIB -DSPDLOG_FMT_EXTERNAL -I/home/pcolberg/git/github.com/OFS/opae-sdk/libraries -I/home/pcolberg/git/github.com/OFS/opae-sdk/samples/base/sw -I/home/pcolberg/git/github.com/OFS/opae-sdk/include -I/home/pcolberg/git/github.com/OFS/opae-sdk/build/include -I/home/pcolberg/git/github.com/OFS/opae-sdk/samples/cxl_host_exerciser -I/home/pcolberg/git/github.com/OFS/opae-sdk/tests/framework -I/home/pcolberg/git/github.com/OFS/opae-sdk/libraries/plugins/xfpga -std=c++14 -g -O0 -Wall -Wextra -Werror -pthread -std=gnu++14 -Wno-unused-result -MD -MT samples/cxl_host_exerciser/CMakeFiles/cxl_host_exerciser.dir/cxl_host_exerciser.cpp.o -MF samples/cxl_host_exerciser/CMakeFiles/cxl_host_exerciser.dir/cxl_host_exerciser.cpp.o.d -o samples/cxl_host_exerciser/CMakeFiles/cxl_host_exerciser.dir/cxl_host_exerciser.cpp.o -c /home/pcolberg/git/github.com/OFS/opae-sdk/samples/cxl_host_exerciser/cxl_host_exerciser.cpp
In file included from /home/pcolberg/git/github.com/OFS/opae-sdk/samples/cxl_host_exerciser/cxl_host_exerciser.h:28,
                 from /home/pcolberg/git/github.com/OFS/opae-sdk/samples/cxl_host_exerciser/cxl_he_cmd.h:32,
                 from /home/pcolberg/git/github.com/OFS/opae-sdk/samples/cxl_host_exerciser/cxl_he_cache_cmd.h:29,
                 from /home/pcolberg/git/github.com/OFS/opae-sdk/samples/cxl_host_exerciser/cxl_host_exerciser.cpp:30:
/home/pcolberg/git/github.com/OFS/opae-sdk/samples/cxl_host_exerciser/he_cache_test.h: In member function ‘int opae::afu_test::afu::main(int, char**)’:
/home/pcolberg/git/github.com/OFS/opae-sdk/samples/cxl_host_exerciser/he_cache_test.h:486:1: error: jump to label ‘drop_logger’
  486 | drop_logger:
      | ^~~~~~~~~~~
/home/pcolberg/git/github.com/OFS/opae-sdk/samples/cxl_host_exerciser/he_cache_test.h:469:12: note:   from here
  469 |       goto drop_logger;
      |            ^~~~~~~~~~~
/home/pcolberg/git/github.com/OFS/opae-sdk/samples/cxl_host_exerciser/he_cache_test.h:473:18: note:   crosses initialization of ‘CLI::Option* opt’
  473 |     CLI::Option* opt = app->get_option_no_throw("--device");
      |                  ^~~
/home/pcolberg/git/github.com/OFS/opae-sdk/samples/cxl_host_exerciser/he_cache_test.h:472:9: note:   crosses initialization of ‘int dev_index’
  472 |     int dev_index = 0;
      |         ^~~~~~~~~
pcolberg commented 5 months ago

This duplicates the code for class scoped_register_logger in the same manner as class afu is duplicated among the three samples. If that is an issue, I feel the latter duplication should be adressed first; rather than promoting class scoped_register_logger to a common, reusable header.