dragonflydb / dragonfly

A modern replacement for Redis and Memcached
https://www.dragonflydb.io/
Other
25.03k stars 899 forks source link

Invalid memory access #2193

Closed chakaz closed 1 week ago

chakaz commented 9 months ago

I ran Dragonfly (debug build) and created many connections to it (almost 30k). At some point I got an error from UB sanitizer (see below).

Notes:

/home/shahar/dragonfly/src/facade/dragonfly_connection.cc:405:74: runtime error: member call on address 0x7f428578ac30 which does not point to an object of type 'Connection'
0x7f428578ac30: note: object is of type 'util::Connection'
 00 00 00 00  68 7f 52 72 91 55 00 00  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  80 c0 86 74
              ^~~~~~~~~~~~~~~~~~~~~~~
              vptr for 'util::Connection'
    #0 0x55917112fef4 in operator() /home/shahar/dragonfly/src/facade/dragonfly_connection.cc:405
    #1 0x559171150d0a in __invoke_impl<void, facade::Connection::HandleRequests()::<lambda(int32_t)>&, unsigned int> /usr/include/c++/11/bits/invoke.h:61
    #2 0x55917114fd92 in __invoke_r<void, facade::Connection::HandleRequests()::<lambda(int32_t)>&, unsigned int> /usr/include/c++/11/bits/invoke.h:111
    #3 0x55917114f01e in _M_invoke /usr/include/c++/11/bits/std_function.h:290
    #4 0x559171160af1 in std::function<void (unsigned int)>::operator()(unsigned int) const /usr/include/c++/11/bits/std_function.h:590
    #5 0x559171860e8a in operator() /home/shahar/dragonfly/helio/util/fibers/uring_socket.cc:361
    #6 0x55917186710c in invoke<util::fb2::UringSocket::PollEvent(uint32_t, std::function<void(unsigned int)>)::<lambda(util::fb2::detail::FiberInterface*, util::fb2::UringProactor::IoResult, uint32_t)>&, util::fb2::detail::FiberInterface*, int, unsigned int> /home/shahar/dragonfly/helio/base/function2.hpp:171
    #7 0x559171866994 in invoke /home/shahar/dragonfly/helio/base/function2.hpp:530
    #8 0x5591718498e6 in decltype(auto) fu2::abi_400::detail::type_erasure::tables::vtable<fu2::abi_400::detail::property<false, false, void (util::fb2::detail::FiberInterface*, int, unsigned int)> >::invoke<0ul, fu2::abi_400::detail::type_erasure::data_accessor*, unsigned long const&, util::fb2::detail::FiberInterface*, int, unsigned int>(fu2::abi_400::detail::type_erasure::data_accessor*&&, unsigned long const&, util::fb2::detail::FiberInterface*&&, int&&, unsigned int&&) const /home/shahar/dragonfly/helio/base/function2.hpp:887
    #9 0x559171849b21 in decltype(auto) fu2::abi_400::detail::type_erasure::erasure<true, fu2::abi_400::detail::config<true, false, fu2::capacity_fixed<16ul, 8ul> >, fu2::abi_400::detail::property<false, false, void (util::fb2::detail::FiberInterface*, int, unsigned int)> >::invoke<0ul, fu2::abi_400::detail::type_erasure::erasure<true, fu2::abi_400::detail::config<true, false, fu2::capacity_fixed<16ul, 8ul> >, fu2::abi_400::detail::property<false, false, void (util::fb2::detail::FiberInterface*, int, unsigned int)> >&, util::fb2::detail::FiberInterface*, int, unsigned int>(fu2::abi_400::detail::type_erasure::erasure<true, fu2::abi_400::detail::config<true, false, fu2::capacity_fixed<16ul, 8ul> >, fu2::abi_400::detail::property<false, false, void (util::fb2::detail::FiberInterface*, int, unsigned int)> >&, util::fb2::detail::FiberInterface*&&, int&&, unsigned int&&) /home/shahar/dragonfly/helio/base/function2.hpp:1088
    #10 0x559171849bfd in fu2::abi_400::detail::type_erasure::invocation_table::operator_impl<0ul, fu2::abi_400::detail::function<fu2::abi_400::detail::config<true, false, fu2::capacity_fixed<16ul, 8ul> >, fu2::abi_400::detail::property<false, false, void (util::fb2::detail::FiberInterface*, int, unsigned int)> >, void (util::fb2::detail::FiberInterface*, int, unsigned int)>::operator()(util::fb2::detail::FiberInterface*, int, unsigned int) /home/shahar/dragonfly/helio/base/function2.hpp:689
    #11 0x55917182c8e4 in util::fb2::UringProactor::DispatchCqe(util::fb2::detail::FiberInterface*, io_uring_cqe const&) /home/shahar/dragonfly/helio/util/fibers/uring_proactor.cc:200
    #12 0x55917183796b in util::fb2::UringProactor::MainLoop(util::fb2::detail::Scheduler*) /home/shahar/dragonfly/helio/util/fibers/uring_proactor.cc:536
    #13 0x55917169fc23 in util::fb2::ProactorDispatcher::Run(util::fb2::detail::Scheduler*) /home/shahar/dragonfly/helio/util/fibers/proactor_base.cc:296
    #14 0x5591716e0bcc in Run /home/shahar/dragonfly/helio/util/fibers/detail/scheduler.cc:391
    #15 0x5591716df8ea in operator() /home/shahar/dragonfly/helio/util/fibers/detail/scheduler.cc:371
    #16 0x5591716f0eee in __invoke_impl<boost::context::fiber, util::fb2::detail::(anonymous namespace)::DispatcherImpl::DispatcherImpl(const boost::context::preallocated&, boost::context::fixedsize_stack&&, util::fb2::detail::Scheduler*)::<lambda(boost::context::fiber&&)>&, boost::context::fiber> /usr/include/c++/11/bits/invoke.h:61
    #17 0x5591716f0ca5 in __invoke<util::fb2::detail::(anonymous namespace)::DispatcherImpl::DispatcherImpl(const boost::context::preallocated&, boost::context::fixedsize_stack&&, util::fb2::detail::Scheduler*)::<lambda(boost::context::fiber&&)>&, boost::context::fiber> /usr/include/c++/11/bits/invoke.h:97
    #18 0x5591716f09cc in invoke<util::fb2::detail::(anonymous namespace)::DispatcherImpl::DispatcherImpl(const boost::context::preallocated&, boost::context::fixedsize_stack&&, util::fb2::detail::Scheduler*)::<lambda(boost::context::fiber&&)>&, boost::context::fiber> /usr/include/c++/11/functional:98
    #19 0x5591716f05a2 in run /usr/include/boost/context/fiber_fcontext.hpp:143
    #20 0x5591716efed5 in fiber_entry<boost::context::detail::fiber_record<boost::context::fiber, boost::context::basic_fixedsize_stack<boost::context::stack_traits>&, util::fb2::detail::(anonymous namespace)::DispatcherImpl::DispatcherImpl(const boost::context::preallocated&, boost::context::fixedsize_stack&&, util::fb2::detail::Scheduler*)::<lambda(boost::context::fiber&&)> > > /usr/include/boost/context/fiber_fcontext.hpp:80
    #21 0x7f42b47f624e in make_fcontext (/lib/x86_64-linux-gnu/libboost_context.so.1.74.0+0x124e)

E20231119 22:09:33.547080 2580624 dragonfly_connection.cc:768] Unexpected event 8192
E20231119 22:09:33.547159 2580624 dragonfly_connection.cc:768] Unexpected event 8192

FWIW, line 405 in dragonfly_connection.cc is

        socket_->RegisterOnErrorCb([this](int32_t mask) { this->OnBreakCb(mask); });

So to my understanding somehow this callback is called when the address that was pointed to be this was changed or while it is being destructed (because oddly the object is a util::Connection while not being a Connection :thinking:)

romange commented 9 months ago

it expects that this->OnBreakCb(mask) will call on facade::Connection*, aka dragonfly connection. instead it's being called on util::Connection, i.e. AFTER facade::~Connection destructor has finished, most likely during the destruction of util::Connection.

romange commented 9 months ago

after inspecting it further, dragonfly_connection.cc:768] Unexpected event 8192 means that the event from iouring socket arrived after connection context was destroyed, proving without doubt that event notifications arrive after they have been cancelled (i.e. after CancelPoll has been called and finished running). This means, we must redesign the socket polling interface in such way that the socket could ignore events coming for callback that has been unregistered. in other words we should use some sort of refcounting mechanism

romange commented 8 months ago

8192 is EPOLLHUP event

romange commented 8 months ago

@chakaz do you remember if you run locally both the client and the server? what was the client?

chakaz commented 8 months ago

Definitely locally The client was python async library I created many connections (I think that after each connection I issued a PING also)

romange commented 8 months ago

do you still have this code? it could be interesting to adapt it into a regtest

chakaz commented 8 months ago

Yes, it's here: https://github.com/dragonflydb/dragonfly/pull/2219/files (was never merged)

IIRC you should run it with --action=connections --keys=30000 (or maybe 25k?)