Icinga / icinga2

The core of our monitoring platform with a powerful configuration language and REST API.
https://icinga.com/docs/icinga2/latest
GNU General Public License v2.0
2k stars 574 forks source link

Icinga2 api deadlock #9916

Open ymartin-ovh opened 10 months ago

ymartin-ovh commented 10 months ago

Describe the bug

We experience since 2.14 with icingadb (we don't have this with icinga-ido-pgsql), icinga2 api deadlock. The issue is visible only on Icinga master zone on node 1 (which handle all config changes). Second master is not impacted by this.

To Reproduce

We don't know how to reproduce this but I managed to take a backtrace of all threads. backtrace-dbg.log

Expected behavior

We expect a working Icinga2 api.

Your Environment

Include as many relevant details about the environment you experienced the problem in

ymartin-ovh commented 10 months ago

I think that icinga::StatusHandler::HandleRequest is related to calls we have on status API (icinga2 exporter and monitoring). I don't understand why threads are stuck on lock whereas callers give up after 5s (our configuration).

Regards

ymartin-ovh commented 10 months ago

Spotted deadlock between icingadb operations and add/remove comment via API:

icingadb thread =>

#0  __lll_lock_wait (futex=futex@entry=0x7f6c122c9c10, private=0) at lowlevellock.c:52
#1  0x00007f6ec3b938d1 in __GI___pthread_mutex_lock (mutex=0x7f6c122c9c10) at ../nptl/pthread_mutex_lock.c:115
#2  0x00005629bfeb22fa in __gthread_mutex_lock (__mutex=0x7f6c122c9c10) at /usr/include/x86_64-linux-gnu/c++/10/bits/gthr-default.h:749
#3  __gthread_recursive_mutex_lock (__mutex=0x7f6c122c9c10) at /usr/include/x86_64-linux-gnu/c++/10/bits/gthr-default.h:811
#4  std::recursive_mutex::lock (this=0x7f6c122c9c10) at /usr/include/c++/10/mutex:106
#5  icinga::ObjectLock::Lock (this=0x7f6c1aabdf00) at ./lib/base/objectlock.cpp:32
#6  0x00005629c0230a21 in icinga::Service::GetSeverity (this=this@entry=0x7f6c122c9c00) at ./lib/icinga/service.cpp:116
#7  0x00005629c0520600 in icinga::IcingaDB::SerializeState (this=<optimized out>, checkable=...) at ./lib/icingadb/icingadb-objects.cpp:2601
#8  0x00005629c0523605 in icinga::IcingaDB::UpdateState (this=0x7f6e92096000, checkable=..., mode=icinga::IcingaDB::Full) at ./lib/icingadb/icingadb-objects.cpp:1139
#9  0x00005629c0527721 in icinga::IcingaDB::SendAddedComment (this=0x7f6e92096000, comment=...) at ./lib/icingadb/icingadb-objects.cpp:2065
#10 0x00005629c0527880 in icinga::IcingaDB::CommentAddedHandler (comment=...) at ./lib/icingadb/icingadb-objects.cpp:2813
#11 0x00005629c02ffa39 in boost::function1<void, boost::intrusive_ptr<icinga::Comment> const&>::operator() (a0=..., this=<optimized out>) at /usr/include/boost/function/function_template.hpp:763
#12 boost::signals2::detail::call_with_tuple_args<boost::signals2::detail::void_type>::m_invoke<boost::function<void (boost::intrusive_ptr<icinga::Comment> const&)>, 0u, boost::intrusive_ptr<icinga::Comment> co>
    at /usr/include/boost/signals2/detail/variadic_slot_invoker.hpp:105
#13 boost::signals2::detail::call_with_tuple_args<boost::signals2::detail::void_type>::operator()<boost::function<void (boost::intrusive_ptr<icinga::Comment> const&)>, boost::intrusive_ptr<icinga::Comment> cons>
    func=..., this=<optimized out>) at /usr/include/boost/signals2/detail/variadic_slot_invoker.hpp:90
#14 boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::intrusive_ptr<icinga::Comment> const&>::operator()<boost::shared_ptr<boost::signals2::detail::connection_body<std::p>
    at /usr/include/boost/signals2/detail/variadic_slot_invoker.hpp:133
#15 boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::intrusive_ptr<icinga::Comment> const&>, std::_List_iterator<boost::sha>
    at /usr/include/boost/signals2/detail/slot_call_iterator.hpp:110
#16 boost::iterators::iterator_core_access::dereference<boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::intrusive_ptr<icin>
    at /usr/include/boost/iterator/iterator_facade.hpp:550
#17 boost::iterators::detail::iterator_facade_base<boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::intrusive_ptr<icinga::C>
    at /usr/include/boost/iterator/iterator_facade.hpp:656
#18 boost::signals2::optional_last_value<void>::operator()<boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::signals2::detail::void_type, boost::intrusive_ptr<i>
ive_ptr<icinga::Comment> const&), boost::function<void (boost::intrusive_ptr<icinga::Comment> const&)> >, boost::signals2::mutex> >) const (last=..., this=<optimized out>, last=..., first=...) at /usr/include/b>
#19 boost::signals2::detail::combiner_invoker<void>::operator()<boost::signals2::optional_last_value<void>, boost::signals2::detail::slot_call_iterator_t<boost::signals2::detail::variadic_slot_invoker<boost::si>
#20 boost::signals2::detail::signal_impl<void (boost::intrusive_ptr<icinga::Comment> const&), boost::signals2::optional_last_value<void>, int, std::less<int>, boost::function<void (boost::intrusive_ptr<icinga::>
#21 0x00005629c0237d05 in boost::signals2::signal<void (boost::intrusive_ptr<icinga::Comment> const&), boost::signals2::optional_last_value<void>, int, std::less<int>, boost::function<void (boost::intrusive_ptr>
#22 icinga::Comment::Start (this=0x7f6c11baf600, runtimeCreated=<optimized out>) at ./lib/icinga/comment.cpp:101
#23 0x00005629bfebfc4c in icinga::ConfigObject::Activate (this=this@entry=0x7f6c11baf600, runtimeCreated=runtimeCreated@entry=true, cookie=...) at ./lib/base/configobject.cpp:368
#24 0x00005629bffe3d79 in icinga::ConfigItem::ActivateItems (newItems=std::vector of length 1, capacity 1 = {...}, runtimeCreated=runtimeCreated@entry=true, mainConfigActivation=mainConfigActivation@entry=false>
#25 0x00005629c0027516 in icinga::ConfigObjectUtility::CreateObject (type=..., fullName=..., config=..., errors=..., diagnosticInformation=..., cookie=...) at ./lib/remote/configobjectutility.cpp:248
#26 0x00005629c01c1713 in icinga::Comment::AddComment (checkable=..., entryType=<optimized out>, author=..., text=..., persistent=<optimized out>, expireTime=6.9219723785798194e-310, sticky=false, id=..., origi>
#27 0x00005629c01c216d in icinga::ApiActions::AddComment (object=..., params=...) at ./lib/icinga/apiactions.cpp:321
#28 0x00005629c027f90a in std::__invoke_impl<boost::intrusive_ptr<icinga::Dictionary>, boost::intrusive_ptr<icinga::Dictionary> (*&)(boost::intrusive_ptr<icinga::ConfigObject> const&, boost::intrusive_ptr<icing>
#29 std::__invoke_r<icinga::Value, boost::intrusive_ptr<icinga::Dictionary> (*&)(boost::intrusive_ptr<icinga::ConfigObject> const&, boost::intrusive_ptr<icinga::Dictionary> const&), boost::intrusive_ptr<icinga:>
#30 std::_Function_handler<icinga::Value (boost::intrusive_ptr<icinga::ConfigObject> const&, boost::intrusive_ptr<icinga::Dictionary> const&), boost::intrusive_ptr<icinga::Dictionary> (*)(boost::intrusive_ptr<i>
#31 0x00005629c004590e in std::function<icinga::Value (boost::intrusive_ptr<icinga::ConfigObject> const&, boost::intrusive_ptr<icinga::Dictionary> const&)>::operator()(boost::intrusive_ptr<icinga::ConfigObject>>
#32 icinga::ApiAction::Invoke (params=..., target=..., this=<optimized out>) at ./lib/remote/apiaction.cpp:14
#33 icinga::ActionsHandler::HandleRequest (this=<optimized out>, stream=..., user=..., request=..., url=..., response=..., params=..., yc=..., server=...) at ./lib/remote/actionshandler.cpp:93
#34 0x00005629c003ba04 in icinga::HttpHandler::ProcessRequest (stream=..., user=..., request=..., response=..., yc=..., server=...) at ./lib/remote/httphandler.cpp:111
#35 0x00005629c0071781 in ProcessRequest (yc=..., hasStartedStreaming=@0x7f6ce83447b1: false, server=..., response=..., authenticatedUser=..., request=..., stream=...) at ./lib/remote/httpserverconnection.cpp:4>
#36 icinga::HttpServerConnection::ProcessMessages (this=0x7f6ce8344720, yc=...) at ./lib/remote/httpserverconnection.cpp:571
#37 0x00005629c0072b20 in operator() (__closure=0x7f6c1aac1680, __closure=0x7f6c1aac1680, yc=...) at ./lib/remote/httpserverconnection.cpp:66
#38 operator() (this=0x7f6c1aac1680, yc=...) at ./lib/base/io-engine.hpp:106
#39 0x00005629c0072c6f in boost::asio::detail::coro_entry_point<boost::asio::executor_binder<void (*)(), boost::asio::io_context::strand>, icinga::IoEngine::SpawnCoroutine<boost::asio::io_context::strand, icing>
#40 boost::coroutines::detail::push_coroutine_object<boost::coroutines::pull_coroutine<void>, void, boost::asio::detail::coro_entry_point<boost::asio::executor_binder<void (*)(), boost::asio::io_context::strand>
#41 boost::coroutines::detail::trampoline_push_void<boost::coroutines::detail::push_coroutine_object<boost::coroutines::pull_coroutine<void>, void, boost::asio::detail::coro_entry_point<boost::asio::executor_bi>
#42 0x00007f6ec368619f in make_fcontext () from /lib/x86_64-linux-gnu/libboost_context.so.1.74.0
#43 0x00005629c093ca88 in vtable for boost::coroutines::detail::push_coroutine_object<boost::coroutines::pull_coroutine<void>, void, boost::asio::detail::coro_entry_point<boost::asio::executor_binder<void (*)()>
#44 0x0000000000000026 in ?? ()
#45 0x0000000000000000 in ?? ()

=> (gdb) print(this->GetName()) (on frame 6)

$20 = {static NPos = 18446744073709551615, m_Data = "cmsorderprivateusdev-002b-gra-vps.static.ovh.net!check_webovhstatic_website_dir"}

addcomment thread =>

#0  __lll_lock_wait (
    futex=futex@entry=0x5629c09746a0 <icinga::ConfigItem::ActivateItems(std::vector<boost::intrusive_ptr<icinga::ConfigItem>, std::allocator<boost::intrusive_ptr<icinga::ConfigItem> > > const&, bool, bool, bool, icinga::Value const&)::mtx>, private=0) at lowlevellock.c:52
#1  0x00007f6ec3b93843 in __GI___pthread_mutex_lock (
    mutex=0x5629c09746a0 <icinga::ConfigItem::ActivateItems(std::vector<boost::intrusive_ptr<icinga::ConfigItem>, std::allocator<boost::intrusive_ptr<icinga::ConfigItem> > > const&, bool, bool, bool, icinga::Value const&)::mtx>) at ../nptl/pthread_mutex_lock.c:80
#2  0x00005629bffe383d in __gthread_mutex_lock (
    __mutex=0x5629c09746a0 <icinga::ConfigItem::ActivateItems(std::vector<boost::intrusive_ptr<icinga::ConfigItem>, std::allocator<boost::intrusive_ptr<icinga::ConfigItem> > > const&, bool, bool, bool, icinga::Value const&)::mtx>) at /usr/include/x86_64-linux-gnu/c++/10/bits/gthr-default.h:749
#3  std::mutex::lock (
    this=0x5629c09746a0 <icinga::ConfigItem::ActivateItems(std::vector<boost::intrusive_ptr<icinga::ConfigItem>, std::allocator<boost::intrusive_ptr<icinga::ConfigItem> > > const&, bool, bool, bool, icinga::Value const&)::mtx>) at /usr/include/c++/10/bits/std_mutex.h:100
#4  std::unique_lock<std::mutex>::lock (this=0x7f6c36c6b820) at /usr/include/c++/10/bits/unique_lock.h:138
#5  std::unique_lock<std::mutex>::unique_lock (__m=..., this=0x7f6c36c6b820) at /usr/include/c++/10/bits/unique_lock.h:68
#6  icinga::ConfigItem::ActivateItems (newItems=std::vector of length 1, capacity 1 = {...}, runtimeCreated=runtimeCreated@entry=true, mainConfigActivation=mainConfigActivation@entry=false, 
    withModAttrs=withModAttrs@entry=false, cookie=...) at ./lib/config/configitem.cpp:683
#7  0x00005629c0027516 in icinga::ConfigObjectUtility::CreateObject (type=..., fullName=..., config=..., errors=..., diagnosticInformation=..., cookie=...) at ./lib/remote/configobjectutility.cpp:248
#8  0x00005629c01c1713 in icinga::Comment::AddComment (checkable=..., entryType=<optimized out>, author=..., text=..., persistent=<optimized out>, expireTime=6.9219956746066321e-310, sticky=true, id=..., 
    origin=...) at ./lib/icinga/comment.cpp:170
#9  0x00005629c022c0a3 in icinga::ApiActions::AcknowledgeProblem (object=..., params=...) at ./lib/icinga/apiactions.cpp:265
#10 0x00005629c027f90a in std::__invoke_impl<boost::intrusive_ptr<icinga::Dictionary>, boost::intrusive_ptr<icinga::Dictionary> (*&)(boost::intrusive_ptr<icinga::ConfigObject> const&, boost::intrusive_ptr<icinga::Dictionary> const&), boost::intrusive_ptr<icinga::ConfigObject> const&, boost::intrusive_ptr<icinga::Dictionary> const&> (__f=<optimized out>) at /usr/include/c++/10/bits/invoke.h:59
#11 std::__invoke_r<icinga::Value, boost::intrusive_ptr<icinga::Dictionary> (*&)(boost::intrusive_ptr<icinga::ConfigObject> const&, boost::intrusive_ptr<icinga::Dictionary> const&), boost::intrusive_ptr<icinga::ConfigObject> const&, boost::intrusive_ptr<icinga::Dictionary> const&> (__fn=<optimized out>, __fn=<optimized out>) at /usr/include/c++/10/bits/invoke.h:113
#12 std::_Function_handler<icinga::Value (boost::intrusive_ptr<icinga::ConfigObject> const&, boost::intrusive_ptr<icinga::Dictionary> const&), boost::intrusive_ptr<icinga::Dictionary> (*)(boost::intrusive_ptr<icinga::ConfigObject> const&, boost::intrusive_ptr<icinga::Dictionary> const&)>::_M_invoke(std::_Any_data const&, boost::intrusive_ptr<icinga::ConfigObject> const&, boost::intrusive_ptr<icinga::Dictionary> const&) (__functor=..., __args#0=..., __args#1=...) at /usr/include/c++/10/bits/std_function.h:292
#13 0x00005629c004590e in std::function<icinga::Value (boost::intrusive_ptr<icinga::ConfigObject> const&, boost::intrusive_ptr<icinga::Dictionary> const&)>::operator()(boost::intrusive_ptr<icinga::ConfigObject> const&, boost::intrusive_ptr<icinga::Dictionary> const&) const (__args#1=..., __args#0=..., this=<optimized out>) at /usr/include/c++/10/bits/std_function.h:622
#14 icinga::ApiAction::Invoke (params=..., target=..., this=<optimized out>) at ./lib/remote/apiaction.cpp:14
#15 icinga::ActionsHandler::HandleRequest (this=<optimized out>, stream=..., user=..., request=..., url=..., response=..., params=..., yc=..., server=...) at ./lib/remote/actionshandler.cpp:93
--Type <RET> for more, q to quit, c to continue without paging--
#16 0x00005629c003ba04 in icinga::HttpHandler::ProcessRequest (stream=..., user=..., request=..., response=..., yc=..., server=...) at ./lib/remote/httphandler.cpp:111
#17 0x00005629c0071781 in ProcessRequest (yc=..., hasStartedStreaming=@0x7f6ce3efdd31: false, server=..., response=..., authenticatedUser=..., request=..., stream=...)
    at ./lib/remote/httpserverconnection.cpp:456
#18 icinga::HttpServerConnection::ProcessMessages (this=0x7f6ce3efdca0, yc=...) at ./lib/remote/httpserverconnection.cpp:571
#19 0x00005629c0072b20 in operator() (__closure=0x7f6c36c6dd00, __closure=0x7f6c36c6dd00, yc=...) at ./lib/remote/httpserverconnection.cpp:66
#20 operator() (this=0x7f6c36c6dd00, yc=...) at ./lib/base/io-engine.hpp:106
#21 0x00005629c0072c6f in boost::asio::detail::coro_entry_point<boost::asio::executor_binder<void (*)(), boost::asio::io_context::strand>, icinga::IoEngine::SpawnCoroutine<boost::asio::io_context::strand, icinga::HttpServerConnection::Start()::<lambda(boost::asio::yield_context)> >::<lambda(boost::asio::yield_context)> >::operator() (ca=..., this=<optimized out>) at /usr/include/boost/asio/impl/spawn.hpp:355
#22 boost::coroutines::detail::push_coroutine_object<boost::coroutines::pull_coroutine<void>, void, boost::asio::detail::coro_entry_point<boost::asio::executor_binder<void (*)(), boost::asio::io_context::strand>, icinga::IoEngine::SpawnCoroutine<boost::asio::io_context::strand, icinga::HttpServerConnection::Start()::<lambda(boost::asio::yield_context)> >::<lambda(boost::asio::yield_context)> >&, boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits> >::run (this=0x7f6c36c6e060) at /usr/include/boost/coroutine/detail/push_coroutine_object.hpp:302
#23 boost::coroutines::detail::trampoline_push_void<boost::coroutines::detail::push_coroutine_object<boost::coroutines::pull_coroutine<void>, void, boost::asio::detail::coro_entry_point<boost::asio::executor_binder<void (*)(), boost::asio::io_context::strand>, icinga::IoEngine::SpawnCoroutine<boost::asio::io_context::strand, icinga::HttpServerConnection::Start()::<lambda(boost::asio::yield_context)> >::<lambda(boost::asio::yield_context)> >&, boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits> > >(boost::context::detail::transfer_t) (t=...)
    at /usr/include/boost/coroutine/detail/trampoline_push.hpp:70
#24 0x00007f6ec368619f in make_fcontext () from /lib/x86_64-linux-gnu/libboost_context.so.1.74.0
#25 0x00005629c093ca88 in vtable for boost::coroutines::detail::push_coroutine_object<boost::coroutines::pull_coroutine<void>, void, boost::asio::detail::coro_entry_point<boost::asio::executor_binder<void (*)(), boost::asio::io_context::strand>, icinga::IoEngine::SpawnCoroutine<boost::asio::io_context::strand, icinga::HttpServerConnection::Start()::{lambda(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >)#1}>(boost::asio::io_context::strand&, icinga::HttpServerConnection::Start()::{lambda(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >)#1})::{lambda(boost::asio::basic_yield_context<boost::asio::executor_binder<void (*)(), boost::asio::executor> >)#1}>&, boost::coroutines::basic_standard_stack_allocator<boost::coroutines::stack_traits> > ()
#26 0x0000000000000026 in ?? ()
#27 0x0000000000000000 in ?? ()

=> info args on frame 7:

type = @0x7f6c36c6c160: {px = 0x5629c0947168 <vtable for std::basic_ofstream<char, std::char_traits<char> >+24>}
fullName = @0x7f6c36c6cb90: {static NPos = 18446744073709551615, m_Data = "cmsorderprivateusdev-002b-gra-vps.static.ovh.net!check_webovhstatic_website_dir!3345cdff-f2b0-476c-827e-2326f20e40b9"}
config = @0x7f6c36c6bfb0: {static NPos = 18446744073709551615, m_Data = "\000\000\000\000\000\000\000\000\000J\364\300n\177\000\000\220!a\347l\177\000\000\240!a\347l\177\000\000\240"}
errors = @0x7f6c36c6c360: {px = 0x7f6c00000357}
diagnosticInformation = @0x7f6c36c6c780: {px = 0x0}
cookie = @0x5629c09729a0: {m_Value = {which_ = 0, storage_ = {<boost::detail::aligned_storage::aligned_storage_imp<32, 8>> = {data_ = {buf = '\000' <repeats 31 times>, align_ = {<No data fields>}}}, 
      static size = <optimized out>, static alignment = <optimized out>}}}

The first thread on this report took the static std::mutex in activateitems and wait for object lock whereas the second done the opposite.

ymartin-ovh commented 10 months ago

Checking with Icinga2 object lock debug:

On icingadb thread:

(gdb) print(this->m_LockOwner)
$21 = {static _S_min_alignment = 8, static _S_alignment = 8, _M_i = {_M_thread = 140112867874560}, static is_always_lock_free = <optimized out>}

140112867874560 => 7F6E91B7D700 is thread 57 (the one with addcomment)

ymartin-ovh commented 10 months ago

Hello @Al2Klimov / @julianbrost

Did you encounter this ? I suspect an issue with the order of those locks:

Regards

ymartin-ovh commented 8 months ago

Hello

I tested your proposal and I still experience deadlock on Icinga2 API with this. Have the issue twice last friday and today.

Details added https://github.com/Icinga/icinga2/pull/9927#issuecomment-1891899238

Regards