eosnetworkfoundation / mandel

Obsolete. Use https://github.com/AntelopeIO/leap instead.
Other
87 stars 27 forks source link

Crash on exit with "corrupted size vs. prev_size" #796

Closed heifner closed 1 year ago

heifner commented 2 years ago

Version main 3.2.x. Also reported in 3.1.x & 2.0.x & 2.1.x. Although generic message may point to different issues over time.

info  2022-08-09T14:00:57.388 net-1     net_plugin.cpp:1016           _close               ] ["xxx:9876 - e1a715e" - 2 1.1.1.1:9876] closing
info  2022-08-09T14:00:57.389 nodeos    net_plugin.cpp:3809           plugin_shutdown      ] exit shutdown
CHAINBASE: Writing "state" database file, this could take a moment...
              1% complete...
              5% complete...
              8% complete...
              12% complete...
              15% complete...
              18% complete...
              22% complete...
              26% complete...
              29% complete...
              32% complete...
              35% complete...
              39% complete...
              42% complete...
              46% complete...
              49% complete...
              53% complete...
              56% complete...
              59% complete...
              62% complete...
              65% complete...
              69% complete...
              72% complete...
              76% complete...
              80% complete...
              85% complete...
              89% complete...
              93% complete...
              97% complete...
           Syncing buffers...
           Complete
corrupted size vs. prev_size
[1]    545250 abort (core dumped)  /usr/bin/nodeos --config-dir /etc/nodeos -d /var/lib/nodeos

Thread dump:

Reading symbols from /usr/bin/nodeos...
(No debugging symbols found in /usr/bin/nodeos)
[New LWP 545250]
[New LWP 545251]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/bin/nodeos --config-dir /etc/nodeos -d /var/lib/nodeos'.
Program terminated with signal SIGABRT, Aborted.
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
[Current thread is 1 (Thread 0x7f7fd1401840 (LWP 545250))]
[?2004h(gdb) thread apply all where
thread apply all where
[?2004l
Thread 2 (Thread 0x7f7fd1400700 (LWP 545251)):
#0  futex_wait_cancelable (private=0, expected=0, futex_word=0x5555b3238b28) at ../sysdeps/nptl/futex-internal.h:186
#1  __pthread_cond_wait_common (abstime=0x0, clockid=0, mutex=0x5555b3238ac8, cond=0x5555b3238b00) at pthread_cond_wait.c:508
#2  __pthread_cond_wait (cond=0x5555b3238b00, mutex=0x5555b3238ac8) at pthread_cond_wait.c:638
#3  0x00005555af6441d3 in boost::asio::detail::scheduler::do_run_one(boost::asio::detail::conditionally_enabled_mutex::scoped_lock&, boost::asio::detail::scheduler_thread_info&, boost::system::error_code const&) ()
#4  0x00005555af643e11 in boost::asio::detail::scheduler::run(boost::system::error_code&) ()
#5  0x00005555af643bde in void* std::__1::__thread_proxy<std::__1::tuple<std::__1::unique_ptr<std::__1::__thread_struct, std::__1::default_delete<std::__1::__thread_struct> >, appbase::application_impl::application_impl()::{lambda()#1}> >(void*) ()
#6  0x00007f7fd173eea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
#7  0x00007f7fd1503def in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Thread 1 (Thread 0x7f7fd1401840 (LWP 545250)):
#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f7fd142b537 in __GI_abort () at abort.c:79
#2  0x00007f7fd1484768 in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f7fd1592e2d "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007f7fd148ba5a in malloc_printerr (str=str@entry=0x7f7fd1591020 "corrupted size vs. prev_size") at malloc.c:5347
#4  0x00007f7fd148c7a6 in unlink_chunk (p=p@entry=0x7f7d44001170, av=0x7f7d44000020) at malloc.c:1454
#5  0x00007f7fd148c8f7 in malloc_consolidate (av=av@entry=0x7f7d44000020) at malloc.c:4502
#6  0x00007f7fd148d0c0 in _int_free (av=0x7f7d44000020, p=0x7f7d4400b5f0, have_lock=<optimized out>) at malloc.c:4400
#7  0x00005555afcb159c in eosio::http_plugin_impl::make_app_thread_url_handler(int, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void (int, std::__1::optional<fc::variant>)>)>, std::__1::shared_ptr<eosio::http_plugin_impl>)::{lambda(std::__1::shared_ptr<eosio::detail::abstract_conn>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void (int, std::__1::optional<fc::variant>)>)#1}::operator()(std::__1::shared_ptr<eosio::detail::abstract_conn>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void (int, std::__1::optional<fc::variant>)>) const::{lambda()#1}::~shared_ptr() ()
#8  0x00005555afcb1d25 in boost::asio::detail::executor_op<boost::asio::detail::work_dispatcher<boost::asio::executor_binder<eosio::http_plugin_impl::make_app_thread_url_handler(int, std::__1::function<void (std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void (int, std::__1::optional<fc::variant>)>)>, std::__1::shared_ptr<eosio::http_plugin_impl>)::{lambda(std::__1::shared_ptr<eosio::detail::abstract_conn>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void (int, std::__1::optional<fc::variant>)>)#1}::operator()(std::__1::shared_ptr<eosio::detail::abstract_conn>, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >, std::__1::function<void (int, std::__1::optional<fc::variant>)>) const::{lambda()#1}, appbase::execution_priority_queue::executor> >, std::__1::allocator<void>, boost::asio::detail::scheduler_operation>::do_complete(void*, boost::asio::detail::scheduler_operation*, boost::system::error_code const&, unsigned long) ()
#9  0x00005555af646975 in boost::asio::detail::scheduler::shutdown() ()
#10 0x00005555af6492b9 in std::__1::__shared_ptr_emplace<boost::asio::io_context, std::__1::allocator<boost::asio::io_context> >::__on_zero_shared() ()
#11 0x00005555af63f8a4 in appbase::application::exec() ()
#12 0x00005555af632f40 in main ()
[?2004h(gdb) quit

Maybe related: https://github.com/EOSIO/eos/issues/8450

Quick glance, looks like http plugin url_handlers iterator in use after url_handlers.clear() in http_plugin::plugin_shutdown. See handle_http_request use of iterator into url_handlers.

Note http rewrite currently in work: https://github.com/eosnetworkfoundation/mandel/pull/675 Should verify any fix is also applied to this if appropriate. I think it is also worth fixing in 3.1 which will not have #675.

matthewdarwin commented 2 years ago

We are regularly running "get info" HTTP requests on nodeos every few seconds. This is for monitoring. Those run even when nodeos is shuttting down.

spoonincode commented 1 year ago

Will track this in linked leap issue now