eclipse-zenoh / zenoh-cpp

C++ API for zenoh
Other
51 stars 31 forks source link

Drop in either `atexit` or `std::signal` causes core dumped. #198

Open YuanYuYuan opened 2 months ago

YuanYuYuan commented 2 months ago

Description

This issue occurs when a user tries to drop a zenoh entity in atexit or std::signal. This is also the case in rmw_zenoh_cpp. To resolve this issue, we might need to add a compilation flag to turn off the z_drop in the destructor.

To reproduce

  1. Apply the following patch
diff --git a/examples/universal/z_sub.cxx b/examples/universal/z_sub.cxx
index a53ae23..bf7a4e8 100644
--- a/examples/universal/z_sub.cxx
+++ b/examples/universal/z_sub.cxx
@@ -14,15 +14,27 @@
 #include <stdio.h>

 #include <chrono>
+#include <csignal>
+#include <cstdlib>
 #include <iostream>
+#include <optional>
 #include <thread>

 #include "../getargs.h"
 #include "zenoh.hxx"
+#include "zenoh/api/session.hxx"
+#include "zenoh/api/subscriber.hxx"

 using namespace zenoh;
 using namespace std::chrono_literals;

+typedef struct Context {
+    std::optional<Session> sess;
+    std::optional<Subscriber<void>> sub;
+} Context;
+
+static Context context;
+
 const char *kind_to_str(SampleKind kind) {
     switch (kind) {
         case SampleKind::Z_SAMPLE_KIND_PUT:
@@ -40,6 +52,14 @@ void data_handler(const Sample &sample) {
               << "')\n";
 }

+void handle_sigint(int signal) {
+    if (signal == SIGINT) {
+        std::cout << "Received SIGINT signal. Performing cleanup tasks..." << std::endl;
+        context.sub.reset();
+        exit(0);
+    }
+}
+
 int _main(int argc, char **argv) {
     const char *expr = "demo/example/**";
     const char *locator = nullptr;
@@ -79,14 +99,17 @@ int _main(int argc, char **argv) {
     KeyExpr keyexpr(expr);

     std::cout << "Opening session..." << std::endl;
-    auto session = Session::open(std::move(config));
+    context.sess = Session::open(std::move(config));

     std::cout << "Declaring Subscriber on '" << keyexpr.as_string_view() << "'..." << std::endl;
-    auto subscriber = session.declare_subscriber(keyexpr, &data_handler, closures::none);
+    context.sub = context.sess.value().declare_subscriber(keyexpr, &data_handler, closures::none);
 #ifdef ZENOHCXX_ZENOHC
-    std::cout << "Subscriber on '" << subscriber.get_keyexpr().as_string_view() << "' declared" << std::endl;
+    std::cout << "Subscriber on '" << context.sub.value().get_keyexpr().as_string_view() << "' declared" << std::endl;
 #endif

+    // Register the signal handler for SIGTERM
+    std::signal(SIGINT, handle_sigint);
+
     std::cout << "Press CTRL-C to quit...\n";
     while (true) {
         std::this_thread::sleep_for(1s);
  1. Build and run _zsub. And then press Ctrl-C to stop the program.

Error log

➜ ./build/examples/zenohc/z_sub
Opening session...
2024-09-03T06:58:51.977472Z  INFO ThreadId(02) zenoh::net::runtime: Using ZID: 41ae6c14329a3f858ac7e4d7843933d3
2024-09-03T06:58:51.978257Z  INFO ThreadId(02) zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/[fe80::694a:ae45:bf2b:33a0]:33921
2024-09-03T06:58:51.978277Z  INFO ThreadId(02) zenoh::net::runtime::orchestrator: Zenoh can be reached at: tcp/192.168.1.140:33921
2024-09-03T06:58:51.978458Z  INFO ThreadId(02) zenoh::net::runtime::orchestrator: zenohd listening scout messages on 224.0.0.224:7446
Declaring Subscriber on 'demo/example/**'...
Subscriber on 'demo/example/**' declared
Press CTRL-C to quit...
^CReceived SIGINT signal. Performing cleanup tasks...
thread '<unnamed>' panicked at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/thread/local.rs:246:26:
cannot access a Thread Local Storage value during or after destruction: AccessError
stack backtrace:
   0:     0x7983211079fc - std::backtrace_rs::backtrace::libunwind::trace::ha637c64ce894333a
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/libunwind.rs:104:5
   1:     0x7983211079fc - std::backtrace_rs::backtrace::trace_unsynchronized::h47f62dea28e0c88d
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x7983211079fc - std::sys_common::backtrace::_print_fmt::h9eef0abe20ede486
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x7983211079fc - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hed7f999df88cc644
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x798320e95710 - core::fmt::rt::Argument::fmt::h1539a9308b8d058d
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/rt.rs:142:9
   5:     0x798320e95710 - core::fmt::write::h3a39390d8560d9c9
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/fmt/mod.rs:1120:17
   6:     0x7983210d4ce2 - std::io::Write::write_fmt::h5fc9997dfe05f882
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/io/mod.rs:1762:15
   7:     0x7983211095ae - std::sys_common::backtrace::_print::h894006fb5c6f3d45
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x7983211095ae - std::sys_common::backtrace::print::h23a2d212c6fff936
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x798321108d20 - std::panicking::default_hook::{{closure}}::h8a1d2ee00185001a
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:272:22
  10:     0x798321109f08 - std::panicking::default_hook::h6038f2eba384e475
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:292:9
  11:     0x798321109f08 - std::panicking::rust_panic_with_hook::h2b5517d590cab22e
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:779:13
  12:     0x79832110990c - std::panicking::begin_panic_handler::{{closure}}::h233112c06e0ef43e
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:657:13
  13:     0x798321109866 - std::sys_common::backtrace::__rust_end_short_backtrace::h6e893f24d7ebbff8
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/sys_common/backtrace.rs:170:18
  14:     0x798321109851 - rust_begin_unwind
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/std/src/panicking.rs:645:5
  15:     0x798320cd9044 - core::panicking::panic_fmt::hbf0e066aabfa482c
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/panicking.rs:72:14
  16:     0x798320cd95b2 - core::result::unwrap_failed::hddb4fea594200c52
                               at /rustc/82e1608dfa6e0b5569232559e3d385fea5a93112/library/core/src/result.rs:1653:5
  17:     0x798321184a67 - zenoh_runtime::ZRuntime::block_in_place::hf9959efffc8868ff
  18:     0x7983211c4763 - <zenoh::api::session::Session as core::ops::drop::Drop>::drop::h0e1d6bad44ae52c6
  19:     0x798320d4cbfc - core::ptr::drop_in_place<zenoh::api::session::Session>::h3d19805dd420053d
  20:     0x798320d55a4d - alloc::sync::Arc<T,A>::drop_slow::h32dc1c11ecc8f032
  21:     0x5f7945fcc4e2 - _Z6z_dropP17z_moved_session_t
                               at /home/user/workspace/src/zenoh-cpp/local/include/zenoh_macros.h:584:62
  22:     0x5f7945fcedd8 - _ZN5zenoh5OwnedI17z_owned_session_tED2Ev
                               at /home/user/workspace/src/zenoh-cpp/include/zenoh/api/base.hxx:81:24
  23:     0x5f7945fcd8e0 - _ZN5zenoh7SessionD2Ev
                               at /home/user/workspace/src/zenoh-cpp/include/zenoh/api/session.hxx:43:7
  24:     0x5f7945fd16d4 - _ZNSt22_Optional_payload_baseIN5zenoh7SessionEE10_M_destroyEv
                               at /usr/include/c++/14.2.1/optional:283:35
  25:     0x5f7945fd0944 - _ZNSt22_Optional_payload_baseIN5zenoh7SessionEE8_M_resetEv
                               at /usr/include/c++/14.2.1/optional:314:14
  26:     0x5f7945fcee64 - _ZNSt17_Optional_payloadIN5zenoh7SessionELb0ELb0ELb0EED2Ev
                               at /usr/include/c++/14.2.1/optional:437:65
  27:     0x5f7945fcd9bc - _ZNSt14_Optional_baseIN5zenoh7SessionELb0ELb0EED2Ev
                               at /usr/include/c++/14.2.1/optional:508:12
  28:     0x5f7945fcd9d8 - _ZNSt8optionalIN5zenoh7SessionEED2Ev
                               at /usr/include/c++/14.2.1/optional:703:11
  29:     0x5f7945fd342c - _ZN7ContextD2Ev
                               at /home/user/workspace/src/zenoh-cpp/examples/universal/z_sub.cxx:31:16
  30:     0x79832064e891 - <unknown>
  31:     0x79832064e95e - exit
  32:     0x5f7945fca947 - _Z13handle_siginti
                               at /home/user/workspace/src/zenoh-cpp/examples/universal/z_sub.cxx:64:13
  33:     0x79832064c1d0 - <unknown>
  34:     0x7983206f0733 - clock_nanosleep
  35:     0x7983206fc827 - nanosleep
  36:     0x5f7945fcfaa3 - _ZNSt11this_thread9sleep_forIlSt5ratioILl1ELl1EEEEvRKNSt6chrono8durationIT_T0_EE
                               at /usr/include/c++/14.2.1/bits/this_thread_sleep.h:80:20
  37:     0x5f7945fcb12f - _Z5_mainiPPc
                               at /home/user/workspace/src/zenoh-cpp/examples/universal/z_sub.cxx:120:36
  38:     0x5f7945fcb3ac - main
                               at /home/user/workspace/src/zenoh-cpp/examples/universal/z_sub.cxx:128:14
  39:     0x798320634e08 - <unknown>
  40:     0x798320634ecc - __libc_start_main
  41:     0x5f7945fca645 - _start
  42:                0x0 - <unknown>
Error: nu::shell::coredump_error

System Info

DenisBiryukov91 commented 2 months ago

Maybe instead of atexit, we could use a static variable that will call all drop logic in its destructor, at least this guaranteed by C++ standard to work correctly. No apparently it is the same as calling function registered with atexit.

yellowhatter commented 1 month ago

There are some (a lot of) system APIs that should not be called in signal handlers, so it is really a bad idea to drop big objects in signal handlers. Moreover, signal handler can be executed concurrently, so thread safety also comes into place. We need to handle signals in a safe way, I can take this task

YuanYuYuan commented 1 month ago

There are some (a lot of) system APIs that should not be called in signal handlers, so it is really a bad idea to drop big objects in signal handlers. Moreover, signal handler can be executed concurrently, so thread safety also comes into place. We need to handle signals in a safe way, I can take this task

Yes. I agree with your point. Dropping objects within signal handlers isn't a good pattern. Rust has put much effort into addressing this safely within the normal scope.