dragonflydb / dragonfly

A modern replacement for Redis and Memcached
https://www.dragonflydb.io/
Other
25.47k stars 926 forks source link

failure on test_network_disconnect_during_migration #3321

Closed kostasrim closed 2 weeks ago

kostasrim commented 2 months ago

https://github.com/dragonflydb/dragonfly/actions/runs/9938955158/job/27452529631#step:6:885

Trace:


cpu time 2.116718053817749 batches 1040 commands 104000
30001➜ virtual method called
30001➜ virtual method called
30001➜ 12:20:15.750623 21588 init.cc:85] Terminate handler called without exception
30001➜ Check failure stack trace: ***
30001➜ 12:20:15.751226 21589 init.cc:85] Terminate handler called without exception
30001➜ Check failure stack trace: ***
30001➜ SIGABRT received at time=1721046015 on cpu 0 ***
30001➜ : 345] RAW: Signal 6 raised at PC=0xea8c70ab1d78 while already in AbslFailureSignalHandler()
30001➜ @     0xea8c70ab1d78  (unknown)  raise
30001➜    @     0xba822268a0cc        480  absl::lts_20240116::AbslFailureSignalHandler()
30001➜    @     0xea8c712d88f8       4960  (unknown)
30001➜    @     0xea8c70a9eaac        304  abort
30001➜    @     0xba822263bc2c        336  google::DumpStackTraceAndExit()
30001➜    @     0xba822262f92c        192  google::LogMessage::Fail()
30001➜    @     0xba822263624c         16  google::LogMessage::SendToLog()
30001➜    @     0xba822262f330        208  google::LogMessage::Flush()
30001➜    @     0xba8222630c6c         80  google::LogMessageFatal::~LogMessageFatal()
30001➜    @     0xba8221ea9694         16  MainInitGuard::MainInitGuard()::{lambda()#1}::operator()()
30001➜    @     0xba8221ea9784        352  MainInitGuard::MainInitGuard()::{lambda()#1}::_FUN()
30001➜    @     0xea8c70d5f20c         16  (unknown)
30001➜    @     0xea8c70d5f270         16  std::terminate()
30001➜    @     0xea8c70d60290         16  __cxa_pure_virtual
30001➜    @     0xba822260c404         16  io::AsyncSink::AsyncWrite()
30001➜    @     0xba8222180804        128  dfly::JournalStreamer::Write()
30001➜    @     0xba822218098c        192  std::_Function_handler<>::_M_invoke()
30001➜    @     0xba822216c170        224  dfly::journal::JournalSlice::AddLogRecord()
30001➜    @     0xba822216ad64        336  dfly::journal::Journal::RecordEntry()
30001➜    @     0xba8222078034        272  boost::context::detail::fiber_entry<>()
------------------------------ Captured log call -------------------------------
BorysTheDev commented 3 weeks ago

@adiholden @romange It looks like it's not a cluster issue. During shutdown we have a race condition, JournalSlice::AddLogRecord() execution can be preempted and the journal can be removed so when we proceed with execution the journal is already destroyed.

romange commented 3 weeks ago

@BorysTheDev are you saying that UnregisterOnChange has been called? but is not it under the cb_mu_ mutex?

BorysTheDev commented 3 weeks ago

@romange void ServerFamily::Shutdown() -> journal_->Close(); during JournalSlice::AddLogRecord() -> callback() because callback can preempt

BorysTheDev commented 2 weeks ago

It looks I was wrong regarding the reason, journal_->Close() doesn't destroy the JournalSlice

BorysTheDev commented 2 weeks ago

Currently is not reproducible and we decided with @adiholden to close it