eBay / NuRaft

C++ implementation of Raft core logic as a replication library
Apache License 2.0
1.01k stars 240 forks source link

compiling nuraft for the raspberry pi #243

Closed faithware closed 2 years ago

faithware commented 3 years ago

I am trying to compile nuraft for the raspberry pi root ~/NuRaft/build >g++ -v Using built-in specs. COLLECT_GCC=g++ COLLECT_LTO_WRAPPER=/usr/lib/gcc/arm-linux-gnueabihf/8/lto-wrapper Target: arm-linux-gnueabihf Configured with: ../src/configure -v --with-pkgversion='Raspbian 8.3.0-6+rpi1' --with-bugurl=file:///usr/share/doc/gcc-8/README.Bugs --enable-languages=c,ada,c++,go,d,fortran,objc,obj-c++ --prefix=/usr --with-gcc-major-version-only --program-suffix=-8 --program-prefix=arm-linux-gnueabihf- --enable-shared --enable-linker-build-id --libexecdir=/usr/lib --without-included-gettext --enable-threads=posix --libdir=/usr/lib --enable-nls --enable-bootstrap --enable-clocale=gnu --enable-libstdcxx-debug --enable-libstdcxx-time=yes --with-default-libstdcxx-abi=new --enable-gnu-unique-object --disable-libitm --disable-libquadmath --disable-libquadmath-support --enable-plugin --with-system-zlib --with-target-system-zlib --enable-objc-gc=auto --enable-multiarch --disable-sjlj-exceptions --with-arch=armv6 --with-fpu=vfp --with-float=hard --disable-werror --enable-checking=release --build=arm-linux-gnueabihf --host=arm-linux-gnueabihf --target=arm-linux-gnueabihf Thread model: posix gcc version 8.3.0 (Raspbian 8.3.0-6+rpi1) Please find attached the build log. The build fails when I compile on the raspberry or during cross compiling build.log

greensky00 commented 3 years ago

Hi @faithware

Seems to me the error is caused by the duplicate definition of ulong:

error: reference to ‘ulong’ is ambiguous

Does this not happen while building the library itself?

$ make static_lib

If so, we can remedy it by replacing ulong with nuraft::ulong in example codes.

faithware commented 3 years ago

Hi @greensky00 , thanks for the reply did that and wen I run the tests I get ``root ~/NuRaft/build >./runtests.sh [ PASS ] buffer basic test (1024) (179 us) [ PASS ] buffer basic test (32768) (105 us) [ PASS ] buffer basic test (65536) (98 us) [ PASS ] buffer serializer test (1) (233 us) [ PASS ] buffer serializer test (0) (69 us) 5 tests passed out of 5 (2.1 ms) [ PASS ] srv_config test (237 us) [ PASS ] cluster_config test (53 us) [ PASS ] snapshot test (46 us) [ PASS ] snapshot_sync_req test (1) (3.2 ms) [ PASS ] snapshot_sync_req test (0) (2.3 ms) [ PASS ] snapshot_sync_req zero buffer test (1) (45 us) [ PASS ] snapshot_sync_req zero buffer test (0) (44 us) [ PASS ] log_entry test (88 us) [ PASS ] custom_notification_msg test (1) (8 us) [ PASS ] custom_notification_msg test (0) (10 us) [ PASS ] out_of_log_msg test (5 us) 11 tests passed out of 11 (8.6 ms) [ PASS ] timer basic test (318.3 ms) [ PASS ] timer cancel test (902.4 ms) 2 tests passed out of 2 (1.2 s) [ PASS ] strfmt basic test (41 us) 1 tests passed out of 1 (737 us) 0 tests passed out of 0 (5 us) [ .... ] make group test === TEST MESSAGE (BEGIN) === [06:29:39.869 522] [tid 0cee] [FATL] [logger.cc:634, flushAllLoggers()] Segmentation fault [06:29:39.870 115] [tid 0cee] [ERRO] [logger.cc:634, flushAllLoggers()] === Critical info (given by user): 0 bytes === [06:29:39.870 492] [tid 0cee] [ERRO] [logger.cc:634, flushAllLoggers()] will not explore other threads (disabled by user) [06:29:39.091 093] [tid 0cee] [ERRO] [logger.cc:634, flushAllLoggers()]

Thread 0cee (23702) (crashed here)

0 0x000000000007fce4 in SimpleLoggerMgr::logStackBacktrace(unsigned int) at /root/NuRaft/examples/logger.cc:390

1 0x000000000008015c in SimpleLoggerMgr::handleSegFault(int) at /root/NuRaft/examples/logger.cc:428

2 0x00000000b6bf2120 in __default_sa_restorer() at ??:0

[SEG FAULT] Flushed all logs safely. ./runtests.sh: line 9: 23702 Segmentation fault ./tests/raft_server_test --abort-on-failure

greensky00 commented 3 years ago

I have no clue from what you shared. We haven't tested it in an ARM environment. If possible, can you share the stack trace and the cause of the crash using GDB? Thanks.

faithware commented 3 years ago

Ok, with gdb nothing is showing up but running echo_server example shows the following:

2021-08-04T07:27:37.498_181+00:00 [e451] [====] Start logger: ./srv1.log (3199187316 MB per file, up to 32 files) [logger.cc:963, start()] 2021-08-04T07:27:38.963_792+00:00 [e451] [INFO] Raft ASIO listener initiated, UNSECURED [asio_service.cxx:705, asio_rpc_listener()] 2021-08-04T07:27:38.019_764+00:00 [e451] [INFO] parameters: timeout 200 - 400, heartbeat 100, leadership expiry 2000, max batch 100, backoff 50, snapshot distance 5, log sync stop gap 99999, reserved logs 5, client timeout 3000, auto for> 2021-08-04T07:27:38.026_176+00:00 [e451] [INFO] new timeout range: 200 -- 400 [raft_server.cxx:320, update_rand_timeout()] 2021-08-04T07:27:38.042_893+00:00 [e451] [INFO] === INIT RAFT SERVER === commit index 0 term 0 election timer allowed log store start 1, end 0 config log idx 0, prev log idx 0 [raft_server.cxx:144, raft_server()] 2021-08-04T07:27:38.046_733+00:00 [e451] [INFO] peer 1: DC ID 0, localhost:10000, voting member, 1 my id: 1, voting_member num peers: 0 [raft_server.cxx:237, raft_server()] 2021-08-04T07:27:38.050_199+00:00 [e451] [INFO] global manager does not exist. will use local thread for commit and append [raft_server.cxx:255, start_server()] 2021-08-04T07:27:38.059_858+00:00 [e451] [INFO] wait for HB, for 50 + [200, 400] ms [raft_server.cxx:280, start_server()] 2021-08-04T07:27:38.072_506+00:00 [17fc] [INFO] bg append_entries thread initiated [handle_append_entries.cxx:47, append_entries_in_bg()] 2021-08-04T07:27:38.616_798+00:00 [cfd1] [WARN] Election timeout, initiate leader election [handle_timeout.cxx:286, handle_election_timeout()] 2021-08-04T07:27:38.631_049+00:00 [cfd1] [INFO] [PRIORITY] decay, target 1 -> 1, mine 1 [handle_priority.cxx:211, decay_target_priority()] 2021-08-04T07:27:38.718_844+00:00 [cfd1] [FATL] Segmentation fault [logger.cc:634, flushAllLoggers()] 2021-08-04T07:27:38.800_300+00:00 [cfd1] [ERRO] === Critical info (given by user): 0 bytes === [logger.cc:634, flushAllLoggers()] 2021-08-04T07:27:39.878_920+00:00 [cfd1] [ERRO] will not explore other threads (disabled by user) [logger.cc:634, flushAllLoggers()] 2021-08-04T07:27:39.813_967+00:00 [cfd1] [ERRO] Thread cfd1 (250) (crashed here)

0 0x0000000000021b6c in () at �^qQ

1 0x0000000000022048 in () at �^qQ

2 0x0000000000022418 in () at �^qQ

3 0x00000000b69ca510 in __default_sa_restorer() at �^qQ

    [logger.cc:634, flushAllLoggers()]
greensky00 commented 3 years ago

Well, I can't help you with such a lack of info about the crash.

From the log, seems to me the crash happened somewhere among these lines https://github.com/eBay/NuRaft/blob/7aa8eb100f3996b1d14f569c23b9298ee107944c/src/handle_timeout.cxx#L292-L297 but those lines are very simple logic with std::map, not likely to cause a crash.

Are you sure your cross-compile is correct?

greensky00 commented 3 years ago

Also, this line seems not normal, why the log file size is that big?

2021-08-04T07:27:37.498_181+00:00 [e451] [====] Start logger: ./srv1.log (3199187316 MB per file, up to 32 files) [logger.cc:963, start()]
faithware commented 3 years ago

Also, this line seems not normal, why the log file size is that big?

2021-08-04T07:27:37.498_181+00:00 [e451] [====] Start logger: ./srv1.log (3199187316 MB per file, up to 32 files) [logger.cc:963, start()]

du -hs srv1.log 19.0K srv1.log The file is not that big. I successfully launched remote debugging for the echo server and i've set the breakpoint in logger.cc in line 633. here is the output


(gdb) c
Continuing.
[Switching to Thread 1881.1895]

Thread 3 "echo_server" hit Breakpoint 3, SimpleLoggerMgr::flushAllLoggers (this=0x652820, level=0, msg=...) at /home/iotblocks/Documents/LADE/buildroot/output/build/libnuraft-v1.3.0/examples/logger.cc:633 633 if (!msg.empty()) { (gdb) bt

0 SimpleLoggerMgr::flushAllLoggers (this=0x652820, level=0, msg=...) at /home/iotblocks/Documents/LADE/buildroot/output/build/libnuraft-v1.3.0/examples/logger.cc:633

1 0x004a7dd4 in SimpleLoggerMgr::flushAllLoggers (this=0x652820) at /home/iotblocks/Documents/LADE/buildroot/output/build/libnuraft-v1.3.0/examples/logger.h:369

2 0x004a27b8 in SimpleLoggerMgr::flushWorker () at /home/iotblocks/Documents/LADE/buildroot/output/build/libnuraft-v1.3.0/examples/logger.cc:495

3 0x004b2d8c in std::__invoke_impl<void, void (*)()> (__f=@0x652acc: 0x4a2760 <SimpleLoggerMgr::flushWorker()>) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/invoke.h:60

4 0x004b2ce4 in std::invoke<void (*)()> (fn=@0x652acc: 0x4a2760 <SimpleLoggerMgr::flushWorker()>) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/invoke.h:95

5 0x004b2c58 in std::thread::_Invoker<std::tuple<void (*)()> >::_M_invoke<0u> (this=0x652acc) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/thread:264

6 0x004b2c14 in std::thread::_Invoker<std::tuple<void (*)()> >::operator() (this=0x652acc) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/thread:271

7 0x004b2bec in std::thread::_State_impl<std::thread::_Invoker<std::tuple<void (*)()> > >::_M_run (this=0x652ac8) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/thread:215

8 0xb6c6fc54 in ?? () from target:/usr/lib/libstdc++.so.6

9 0xb6b0adf8 in start_thread () from target:/lib/libpthread.so.0

10 0xb6a933a8 in ?? () from target:/lib/libc.so.6

Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) n

Thread 2 "nuraft_w_0" received signal SIGSEGV, Segmentation fault. [Switching to Thread 1881.1897] 0xb6a3766e in strlen () from target:/lib/libc.so.6 (gdb) bt

0 0xb6a3766e in strlen () from target:/lib/libc.so.6

1 0xb6a14a88 in ?? () from target:/lib/libc.so.6

Backtrace stopped: previous frame identical to this frame (corrupt stack?)

greensky00 commented 3 years ago

What I meant was this part 3199187316 MB per file, it should be 32 MB by default. The example code does not change this value. More investigation is needed why this variable had this garbage value.

Also, logger.cc:633 is just a seg-fault handler (which will be visited AFTER crash), your breakpoint needs to be on handle_timeout.cxx:292.

faithware commented 3 years ago

I replaced all ulong variables in the source code with nuraft::ulong does this may cause the issue?

greensky00 commented 3 years ago

Not sure, but I don't believe so. If compiled binary is correct, I guess the problem is something related to big/little-endian or 32-bit/64-bit, as we've tested and deployed it on 64-bit little-endian machines only. Your raspberry pi is little-endian and 32-bit, right?

faithware commented 3 years ago

Thanks @greensky00 for your reply. It is cortex A72 big little endian. https://developer.arm.com/documentation/100095/0002/programmers-model/memory-model

faithware commented 3 years ago

Hi again, I successfully started a well configured debug session.


Thread 2 "nuraft_w_3" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 482.488]
strlen () at ../sysdeps/arm/armv6t2/strlen.S:126
126     ldrd    data1a, data1b, [src]
(gdb) bt
#0  strlen () at ../sysdeps/arm/armv6t2/strlen.S:126
#1  0xb6bf0a88 in __vfprintf_internal (s=0xb3f92a20, s@entry=0xb3f92a18, format=format@entry=0x34b9cc "[ELECTION TIMEOUT] current role: %s, log last term %lu, state term %lu, target p %d, my p %d, %s, %s", ap=..., ap@entry=..., 
    mode_flags=mode_flags@entry=3019456432) at vfprintf-internal.c:1647
#2  0xb6c012e8 in __vsnprintf_internal (string=0xb3f92b4c "[ELECTION TIMEOUT] current role: follower, log last term 3455436, state term 0, target p 0, my p 0, (null), ", maxlen=<optimized out>, 
    format=0x34b9cc "[ELECTION TIMEOUT] current role: %s, log last term %lu, state term %lu, target p %d, my p %d, %s, %s", args=..., mode_flags=mode_flags@entry=0) at vsnprintf.c:114
#3  0xb6c01318 in ___vsnprintf (string=<optimized out>, maxlen=<optimized out>, format=<optimized out>, args=...) at vsnprintf.c:124
#4  0x00189d88 in msg_if_given (format=0x34b9cc "[ELECTION TIMEOUT] current role: %s, log last term %lu, state term %lu, target p %d, my p %d, %s, %s") at /home/iotblocks/Documents/LADE/NuRaft/src/tracer.hxx:36
#5  0x0018cb10 in nuraft::raft_server::handle_election_timeout (this=0x419848) at /home/iotblocks/Documents/LADE/NuRaft/src/handle_timeout.cxx:299
#6  0x00091b3c in std::__invoke_impl<void, void (nuraft::raft_server::*&)(), nuraft::raft_server*&> (__f=@0x419cf0: (void (nuraft::raft_server::*)(nuraft::raft_server * const)) 0x18bc9c <nuraft::raft_server::handle_election_timeout()>, 
    __t=@0x419cf8: 0x419848) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/invoke.h:73
#7  0x0008f750 in std::__invoke<void (nuraft::raft_server::*&)(), nuraft::raft_server*&> (__fn=@0x419cf0: (void (nuraft::raft_server::*)(nuraft::raft_server * const)) 0x18bc9c <nuraft::raft_server::handle_election_timeout()>)
    at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/invoke.h:95
#8  0x0008d120 in std::_Bind<void (nuraft::raft_server::*(nuraft::raft_server*))()>::__call<void, , 0u>(std::tuple<>&&, std::_Index_tuple<0u>) (this=0x419cf0, __args=...)
    at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/functional:416
#9  0x0008b6cc in std::_Bind<void (nuraft::raft_server::*(nuraft::raft_server*))()>::operator()<, void>() (this=0x419cf0)
    at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/functional:499
#10 0x00089930 in std::__invoke_impl<void, std::_Bind<void (nuraft::raft_server::*(nuraft::raft_server*))()>&>(std::__invoke_other, std::_Bind<void (nuraft::raft_server::*(nuraft::raft_server*))()>&) (__f=...)
    at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/invoke.h:60
#11 0x00087430 in std::__invoke_r<void, std::_Bind<void (nuraft::raft_server::*(nuraft::raft_server*))()>&>(std::_Bind<void (nuraft::raft_server::*(nuraft::raft_server*))()>&) (__fn=...)
    at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/invoke.h:153
#12 0x00083ba4 in std::_Function_handler<void (), std::_Bind<void (nuraft::raft_server::*(nuraft::raft_server*))()> >::_M_invoke(std::_Any_data const&) (__functor=...)
    at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/std_function.h:291
#13 0x000804e8 in std::function<void ()>::operator()() const (this=0x415a44) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/std_function.h:622
#14 0x0007da24 in nuraft::timer_task<void>::exec (this=0x415a24) at /home/iotblocks/Documents/LADE/NuRaft/include/libnuraft/timer_task.hxx:62
#15 0x000a9e58 in nuraft::delayed_task::execute (this=0x415a24) at /home/iotblocks/Documents/LADE/NuRaft/include/libnuraft/delayed_task.hxx:54
#16 0x000a74b0 in _timer_handler_ (task=..., err=...) at /home/iotblocks/Documents/LADE/NuRaft/src/asio_service.cxx:1504
#17 0x00115df0 in std::__invoke_impl<void, void (*&)(std::shared_ptr<nuraft::delayed_task>&, std::error_code), std::shared_ptr<nuraft::delayed_task>&, std::error_code const&> (
    __f=@0xb3f93620: 0xa7464 <_timer_handler_(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/invoke.h:60
#18 0x00111fe4 in std::__invoke<void (*&)(std::shared_ptr<nuraft::delayed_task>&, std::error_code), std::shared_ptr<nuraft::delayed_task>&, std::error_code const&> (
    __fn=@0xb3f93620: 0xa7464 <_timer_handler_(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/invoke.h:95
#19 0x0010e9e0 in std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>::__call<void, std::error_code const&, 0u, 1u>(std::tuple<std::error_code const&>&&, std::_Index_tuple<0u, 1u>) (this=0xb3f93620, __args=...) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/functional:416
#20 0x0010c238 in std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>::operator()<std::error_code const&, void>(std::error_code const&) (
    this=0xb3f93620) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/functional:499
--Type <RET> for more, q to quit, c to continue without paging--c
#21 0x00109458 in asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>::operator()() (this=0xb3f93620) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/sysroot/usr/include/asio/detail/bind_handler.hpp:64
#22 0x0010436c in asio::asio_handler_invoke<asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code> >(asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>&, ...) (function=...) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/sysroot/usr/include/asio/handler_invoke_hook.hpp:68
#23 0x000fdd00 in asio_handler_invoke_helpers::invoke<asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>, std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)> >(asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>&, std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>&) (function=..., context=...) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/sysroot/usr/include/asio/detail/handler_invoke_helpers.hpp:37
#24 0x000fa0e8 in asio::detail::asio_handler_invoke<asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>, std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>(asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>&, asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>*) (function=..., this_handler=0xb3f93620) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/sysroot/usr/include/asio/detail/bind_handler.hpp:105
#25 0x000f6534 in asio_handler_invoke_helpers::invoke<asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>, asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code> >(asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>&, asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>&) (function=..., context=...) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/sysroot/usr/include/asio/detail/handler_invoke_helpers.hpp:37
#26 0x000f2ee8 in asio::detail::io_object_executor<asio::executor>::dispatch<asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>, std::allocator<void> >(asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>&&, std::allocator<void> const&) const (this=0xb3f93698, f=..., a=...) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/sysroot/usr/include/asio/detail/io_object_executor.hpp:116
#27 0x000ef1a4 in asio::detail::handler_work<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, asio::detail::io_object_executor<asio::executor>, asio::detail::io_object_executor<asio::executor> >::complete<asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code> >(asio::detail::binder1<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, std::error_code>&, std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>&) (this=0xb3f93690, function=..., handler=...) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/sysroot/usr/include/asio/detail/handler_work.hpp:71
#28 0x000ea10c in asio::detail::wait_handler<std::_Bind<void (*(std::shared_ptr<nuraft::delayed_task>, std::_Placeholder<1>))(std::shared_ptr<nuraft::delayed_task>&, std::error_code)>, asio::detail::io_object_executor<asio::executor> >::do_complete(void*, asio::detail::scheduler_operation*, std::error_code const&, unsigned int) (owner=0x3f52a8, base=0x4163c8) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/sysroot/usr/include/asio/detail/wait_handler.hpp:72
#29 0x000abf78 in asio::detail::scheduler_operation::complete (this=0x4163c8, owner=0x3f52a8, ec=..., bytes_transferred=0) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/sysroot/usr/include/asio/detail/scheduler_operation.hpp:39
#30 0x000b1724 in asio::detail::scheduler::do_run_one (this=0x3f52a8, lock=..., this_thread=..., ec=...) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/sysroot/usr/include/asio/detail/impl/scheduler.ipp:446
#31 0x000b0e08 in asio::detail::scheduler::run (this=0x3f52a8, ec=...) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/sysroot/usr/include/asio/detail/impl/scheduler.ipp:199
#32 0x000b1d9c in asio::io_context::run (this=0x3f50f0) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/sysroot/usr/include/asio/impl/io_context.ipp:62
#33 0x000a7c98 in nuraft::asio_service_impl::worker_entry (this=0x3f50f0) at /home/iotblocks/Documents/LADE/NuRaft/src/asio_service.cxx:1607
#34 0x0013bc0c in std::__invoke_impl<void, void (nuraft::asio_service_impl::*&)(), nuraft::asio_service_impl*&> (__f=@0x419584: (void (nuraft::asio_service_impl::*)(nuraft::asio_service_impl * const)) 0xa7af0 <nuraft::asio_service_impl::worker_entry()>, __t=@0x41958c: 0x3f50f0) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/invoke.h:73
#35 0x0013bb30 in std::__invoke<void (nuraft::asio_service_impl::*&)(), nuraft::asio_service_impl*&> (__fn=@0x419584: (void (nuraft::asio_service_impl::*)(nuraft::asio_service_impl * const)) 0xa7af0 <nuraft::asio_service_impl::worker_entry()>) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/invoke.h:95
#36 0x0013b944 in std::_Bind<void (nuraft::asio_service_impl::*(nuraft::asio_service_impl*))()>::__call<void, , 0u>(std::tuple<>&&, std::_Index_tuple<0u>) (this=0x419584, __args=...) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/functional:416
#37 0x0013b644 in std::_Bind<void (nuraft::asio_service_impl::*(nuraft::asio_service_impl*))()>::operator()<, void>() (this=0x419584) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/functional:499
#38 0x0013b058 in std::__invoke_impl<void, std::_Bind<void (nuraft::asio_service_impl::*(nuraft::asio_service_impl*))()>>(std::__invoke_other, std::_Bind<void (nuraft::asio_service_impl::*(nuraft::asio_service_impl*))()>&&) (__f=...) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/invoke.h:60
#39 0x0013a708 in std::__invoke<std::_Bind<void (nuraft::asio_service_impl::*(nuraft::asio_service_impl*))()>>(std::_Bind<void (nuraft::asio_service_impl::*(nuraft::asio_service_impl*))()>&&) (__fn=...) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/bits/invoke.h:95
#40 0x00139e5c in std::thread::_Invoker<std::tuple<std::_Bind<void (nuraft::asio_service_impl::*(nuraft::asio_service_impl*))()> > >::_M_invoke<0u>(std::_Index_tuple<0u>) (this=0x419584) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/thread:264
#41 0x00139084 in std::thread::_Invoker<std::tuple<std::_Bind<void (nuraft::asio_service_impl::*(nuraft::asio_service_impl*))()> > >::operator()() (this=0x419584) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/thread:271
#42 0x00137634 in std::thread::_State_impl<std::thread::_Invoker<std::tuple<std::_Bind<void (nuraft::asio_service_impl::*(nuraft::asio_service_impl*))()> > > >::_M_run() (this=0x419580) at /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/include/c++/10.3.0/thread:215
#43 0xb6e5a098 in std::execute_native_thread_routine (__p=<optimized out>) at ../../../../../libstdc++-v3/src/c++11/thread.cc:80
#44 0xb6ce7e1c in start_thread (arg=0xb3f93fb0) at pthread_create.c:463
#45 0xb6c6f3a8 in ?? () at ../sysdeps/unix/sysv/linux/arm/clone.S:73 from /home/iotblocks/Documents/LADE/buildroot/output/host/arm-buildroot-linux-gnueabihf/sysroot/lib/libc.so.6
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
greensky00 commented 3 years ago

Thanks @faithware, that was helpful.

Apparently, this log looks odd:

[ELECTION TIMEOUT] current role: follower, log last term 3455436, state term 0, target p 0, my p 0, (null), 
        p_in( "[ELECTION TIMEOUT] current role: %s, log last term %lu, "
              "state term %lu, target p %d, my p %d, %s, %s",
              srv_role_to_string(role_).c_str(), last_log_term, state_term,
              target_priority_, my_priority_,
              (hb_alive_) ? "hb alive" : "hb dead",
              (pre_vote_.done_) ? "pre-vote done" : "pre-vote NOT done");

If the above things are all correct (last_log_term is supposed to be 1 at the beginning, and sizeof(last_log_term) is 8), then probably %lu in the log stmt may cause offset shifting and result in this problem. Can you replace %lu in the above log with %llu as follows and try it if the code prints out the correct log?

        p_in( "[ELECTION TIMEOUT] current role: %s, log last term %llu, "
              "state term %llu, target p %d, my p %d, %s, %s",
              srv_role_to_string(role_).c_str(), last_log_term, state_term,
              target_priority_, my_priority_,
              (hb_alive_) ? "hb alive" : "hb dead",
              (pre_vote_.done_) ? "pre-vote done" : "pre-vote NOT done");

If this is the case, the entire code needs to be re-visited to support 32-bit machines.

faithware commented 3 years ago
Thread 2 "nuraft_w_0" hit Breakpoint 1, nuraft::raft_server::handle_election_timeout (this=0x419848) at /home/iotblocks/Documents/LADE/NuRaft/src/handle_timeout.cxx:307
307         if (pre_vote_.term_ != state_term) {
(gdb) p last_log_term
$7 = 0

You are right, changing to %llu fixed the issue. And yes the size is 8. So I need to correct the formatting around all the logs. Cheers!