fbnzk / leveldb

Automatically exported from code.google.com/p/leveldb
BSD 3-Clause "New" or "Revised" License
0 stars 0 forks source link

LevelDB gets stuck (no progress on iterators/writes that are waiting for lock/conditional variable) #239

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?
1. Have one thread making batch write (50 writes, up to 25 deletes).
2. Have another thread creating DB iterator and iterating over all keys and 
values continuously
3. Have another thread that sleeps for up to 60 sec, creates a DB iterator and 
iterates over all keys for a final time and then raise a SIGSEGV to kill the 
process

What is the expected output? What do you see instead?
- LevelDB should work fine
- Write thread got stuck and the process never left after running 1 hour (all 
other runs exit after at most 60 seconds)

What version of the product are you using? On what operating system?
- Live version (as of 06/11/2014), on Linux (Ubuntu)

Please provide any additional information below.
- Basically, it seems that one of the threads iterating over the DB holds the 
mutex and, for some reason, it got stuck (Thread 2). That prevented compaction 
work, which blocked writer and the other iterator.
- OpenCompactionWork (db_impl.cc:762) gets stuck waiting for mutex_.Lock()
- NewInternalIterator (db_impl.cc:1039) gets stuck waiting for mutex_.Lock()
- Writer thread is waiting on bg_cv_.Wait() (db_impl.cc:1321), which is only 
signaled by the compaction thread after acquiring the lock. Writer thread held 
the mutex_ originally, but bg_cv_.Wait() should have released it while waiting

- Please find thread dump below:

Thread 8 (Thread 0x7f4db01ff700 (LWP 26305)):
#0  0x00007f4db13f2123 in poll () from /usr/grte/v3/lib64/libc.so.6
#1  0x000000000075c5b9 in ExitTimeoutWatcher(void*) () at thread/thread.cc:1234
#2  0x00007f4db1b9f14e in start_thread () from 
/usr/grte/v3/lib64/libpthread.so.0
#3  0x00007f4db131169d in clone () from /usr/grte/v3/lib64/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 7 (Thread 0x7f4db01f5700 (LWP 26306)):
#0  0x00007f4db13f2123 in poll () from /usr/grte/v3/lib64/libc.so.6
#1  0x000000000075c9f8 in ThreadLivenessWatcher(void*) () at 
thread/thread.cc:1133
#2  0x00007f4db1b9f14e in start_thread () from 
/usr/grte/v3/lib64/libpthread.so.0
#3  0x00007f4db131169d in clone () from /usr/grte/v3/lib64/libc.so.6
#4  0x0000000000000000 in ?? ()

Thread 6 (Thread 0x7f4db01eb700 (LWP 26307)):
#0  0x00007f4db1ba6f7d in nanosleep () from /usr/grte/v3/lib64/libpthread.so.0
#1  0x000000000075bd06 in ReleaseMemoryToOS() () at thread/thread.cc:2158
#2  0x00000000007432a3 in _FunctionResultCallback_0_0<false, void>::Run() () at 
./base/callback-specializations.h:278
#3  0x000000000075a5d8 in ClosureThread::Run() () at thread/thread.cc:1354
#4  0x00000000007593c2 in Thread::ThreadBody(void*) () at thread/thread.cc:1061
#5  0x00007f4db1b9f14e in start_thread () from 
/usr/grte/v3/lib64/libpthread.so.0
#6  0x00007f4db131169d in clone () from /usr/grte/v3/lib64/libc.so.6
#7  0x0000000000000000 in ?? ()

Thread 5 (Thread 0x7f4dafeea700 (LWP 26308)):
#0  0x00000000009ca4f8 in base::internal::sys_futex(int*, int, int, 
base::internal::kernel_timespec*) ()
    at ./base/linux_syscall_support.h:3053
#1  0x00000000009ca275 in base::internal::PerThreadSem::Wait(timespec const*) 
() at base/per-thread-sem.cc:189
#2  0x00000000009e9755 in Mutex::DecrementSynchSem(PerThreadSynch*, timespec*) 
() at base/synchronization.cc:622
#3  0x00000000009e856d in CondVar::WaitGeneric(void*, timespec*, void 
(*)(void*), void (*)(void*)) () at base/synchronization.cc:2655
#4  0x00000000009e88a0 in CondVar::Wait(Mutex*) () at 
base/synchronization.cc:2684
#5  0x0000000000761e19 in threadlogger::LoggingThread::Run() () at 
thread/logger.cc:184
#6  0x00000000007593c2 in Thread::ThreadBody(void*) () at thread/thread.cc:1061
#7  0x00007f4db1b9f14e in start_thread () from 
/usr/grte/v3/lib64/libpthread.so.0
#8  0x00007f4db131169d in clone () from /usr/grte/v3/lib64/libc.so.6
#9  0x0000000000000000 in ?? ()

Thread 4 (Thread 0x7f4da164f700 (LWP 26309)):
#0  0x00000000009ca4f8 in base::internal::sys_futex(int*, int, int, 
base::internal::kernel_timespec*) ()
    at ./base/linux_syscall_support.h:3053
#1  0x00000000009ca275 in base::internal::PerThreadSem::Wait(timespec const*) 
() at base/per-thread-sem.cc:189
#2  0x00000000009e9755 in Mutex::DecrementSynchSem(PerThreadSynch*, timespec*) 
() at base/synchronization.cc:622
#3  0x00000000009e32cb in Mutex::Block(PerThreadSynch*) () at 
base/synchronization.cc:1193
#4  0x00000000009e6b02 in Mutex::LockSlowLoop(SynchWaitParams*, int) () at 
base/synchronization.cc:2051
#5  0x00000000009e4e48 in Mutex::LockSlowWithTimeout(MuHowS const*, Condition 
const*, timespec*, int) () at base/synchronization.cc:1947
#6  0x00000000009e473e in Mutex::LockSlow(MuHowS const*, Condition const*, int) 
() at base/synchronization.cc:1895
#7  0x00000000009e4600 in Mutex::Lock() () at base/synchronization.cc:1638
#8  0x00000000004192d0 in 
leveldb::DBImpl::OpenCompactionOutputFile(leveldb::DBImpl::CompactionState*) () 
at storage/leveldb/db/db_impl.cc:762
#9  0x0000000000418b03 in 
leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*) () at 
storage/leveldb/db/db_impl.cc:954
#10 0x000000000041832d in leveldb::DBImpl::BackgroundCompaction() () at 
storage/leveldb/db/db_impl.cc:706
#11 0x0000000000417d3d in leveldb::DBImpl::BackgroundCall() () at 
storage/leveldb/db/db_impl.cc:644
#12 0x0000000000417c85 in leveldb::DBImpl::BGWork(void*) () at 
storage/leveldb/db/db_impl.cc:633
#13 0x0000000000449e41 in leveldb::(anonymous namespace)::PosixEnv::BGThread() 
() at storage/leveldb/util/env_posix.cc:567
#14 0x0000000000449075 in leveldb::(anonymous 
namespace)::PosixEnv::BGThreadWrapper(void*) () at 
storage/leveldb/util/env_posix.cc:506
#15 0x00007f4db1b9f14e in start_thread () from 
/usr/grte/v3/lib64/libpthread.so.0
#16 0x00007f4db131169d in clone () from /usr/grte/v3/lib64/libc.so.6
#17 0x0000000000000000 in ?? ()

Thread 3 (Thread 0x7f4da0e4e700 (LWP 26310)):
#0  0x00007f4db1ba5b00 in sem_wait () from /usr/grte/v3/lib64/libpthread.so.0
#1  0x0000000000467c5b in 
thread::PthreadDomain::BlockCurrent(base::scheduling::Schedulable*, timespec 
const*) ()
    at thread/fiber/pthread-domain.cc:236
#2  0x00000000009da6f9 in base::scheduling::Downcalls::UserSchedule(bool, 
timespec const*) () at base/scheduling/downcalls.cc:487
#3  0x00000000009ca83c in base::scheduling::Downcalls::Wait(timespec const*) () 
at ./base/scheduling/downcalls.h:260
#4  0x00000000009ca1e4 in base::internal::PerThreadSem::Wait(timespec const*) 
() at base/per-thread-sem.cc:179
#5  0x00000000009e9755 in Mutex::DecrementSynchSem(PerThreadSynch*, timespec*) 
() at base/synchronization.cc:622
#6  0x00000000009e32cb in Mutex::Block(PerThreadSynch*) () at 
base/synchronization.cc:1193
#7  0x00000000009e6b02 in Mutex::LockSlowLoop(SynchWaitParams*, int) () at 
base/synchronization.cc:2051
#8  0x00000000009e4e48 in Mutex::LockSlowWithTimeout(MuHowS const*, Condition 
const*, timespec*, int) () at base/synchronization.cc:1947
#9  0x00000000009e473e in Mutex::LockSlow(MuHowS const*, Condition const*, int) 
() at base/synchronization.cc:1895
#10 0x00000000009e4600 in Mutex::Lock() () at base/synchronization.cc:1638
#11 0x0000000000419adb in 
leveldb::DBImpl::NewInternalIterator(leveldb::ReadOptions const&, unsigned 
long*, unsigned int*) ()
    at storage/leveldb/db/db_impl.cc:1039
#12 0x000000000041a0a7 in leveldb::DBImpl::NewIterator(leveldb::ReadOptions 
const&) () at storage/leveldb/db/db_impl.cc:1127
#13 0x0000000000413a6a in cumulus::LevelDbWriteTool::Check(int, int, int*) () 
at cumulus/leveldb/leveldb_write_tool.cc:120
#14 0x0000000000411c0b in cumulus::LevelDbWriteTool::KillThread() () at 
cumulus/leveldb/leveldb_write_tool.cc:230
#15 0x000000000041295d in _MemberResultCallback_0_0<true, void, 
cumulus::LevelDbWriteTool, void>::Run() ()
    at ./base/callback-specializations.h:191
#16 0x000000000045993b in thread::Fiber::Body(Closure*) () at 
thread/fiber/fiber.cc:311
#17 0x000000000045c7c5 in _MemberResultCallback_1_0<true, void, thread::Fiber, 
Closure*, void>::Run() ()
    at ./base/callback-specializations.h:483
#18 0x000000000046822a in thread::PthreadDomainThread::Run() () at 
thread/fiber/pthread-domain.cc:125
#19 0x00000000007593c2 in Thread::ThreadBody(void*) () at thread/thread.cc:1061
#20 0x00007f4db1b9f14e in start_thread () from 
/usr/grte/v3/lib64/libpthread.so.0
#21 0x00007f4db131169d in clone () from /usr/grte/v3/lib64/libc.so.6
#22 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f4d8c063700 (LWP 26311)):
#0  0x000000000044e30d in leveldb::crc32c::Extend(unsigned int, char const*, 
unsigned long) () at storage/leveldb/util/crc32c.cc:316
#1  0x00000000004396b4 in leveldb::crc32c::Value(char const*, unsigned long) () 
at ./storage/leveldb/util/crc32c.h:21
#2  0x0000000000442206 in leveldb::ReadBlock(leveldb::RandomAccessFile*, 
leveldb::ReadOptions const&, leveldb::BlockHandle const&, 
leveldb::BlockContents*) () at storage/leveldb/table/format.cc:93
#3  0x000000000043f821 in leveldb::Table::BlockReader(void*, 
leveldb::ReadOptions const&, leveldb::Slice const&) ()
    at storage/leveldb/table/table.cc:189
#4  0x000000000043cbeb in leveldb::(anonymous 
namespace)::TwoLevelIterator::InitDataBlock() ()
    at storage/leveldb/table/two_level_iterator.cc:165
#5  0x000000000043cd5b in leveldb::(anonymous 
namespace)::TwoLevelIterator::SkipEmptyDataBlocksForward() ()
    at storage/leveldb/table/two_level_iterator.cc:133
#6  0x000000000043c706 in leveldb::(anonymous 
namespace)::TwoLevelIterator::Next() () at 
storage/leveldb/table/two_level_iterator.cc:115
#7  0x000000000043d265 in leveldb::IteratorWrapper::Next() () at 
./storage/leveldb/table/iterator_wrapper.h:42
#8  0x0000000000440864 in leveldb::(anonymous 
namespace)::MergingIterator::Next() () at storage/leveldb/table/merger.cc:81
#9  0x000000000043758e in leveldb::(anonymous 
namespace)::DBIter::FindNextUserEntry(bool, std::string*) ()
    at storage/leveldb/db/db_iter.cc:199
#10 0x0000000000436afa in leveldb::(anonymous namespace)::DBIter::Next() () at 
storage/leveldb/db/db_iter.cc:170
#11 0x0000000000413c6e in cumulus::LevelDbWriteTool::Check(int, int, int*) () 
at cumulus/leveldb/leveldb_write_tool.cc:124
#12 0x0000000000411e13 in cumulus::LevelDbWriteTool::KeepChecking(int, int) () 
at cumulus/leveldb/leveldb_write_tool.cc:198
#13 0x00000000004126d7 in _MemberResultCallback_2_0<true, void, 
cumulus::LevelDbWriteTool, int, int, void>::Run() ()
    at ./base/callback-specializations.h:781
#14 0x000000000045993b in thread::Fiber::Body(Closure*) () at 
thread/fiber/fiber.cc:311
#15 0x000000000045c7c5 in _MemberResultCallback_1_0<true, void, thread::Fiber, 
Closure*, void>::Run() ()
    at ./base/callback-specializations.h:483
#16 0x000000000046822a in thread::PthreadDomainThread::Run() () at 
thread/fiber/pthread-domain.cc:125
#17 0x00000000007593c2 in Thread::ThreadBody(void*) () at thread/thread.cc:1061
#18 0x00007f4db1b9f14e in start_thread () from 
/usr/grte/v3/lib64/libpthread.so.0
#19 0x00007f4db131169d in clone () from /usr/grte/v3/lib64/libc.so.6
#20 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f4db240ba00 (LWP 26304)):
#0  0x00000000009ca4f8 in base::internal::sys_futex(int*, int, int, 
base::internal::kernel_timespec*) ()
    at ./base/linux_syscall_support.h:3053
#1  0x00000000009ca275 in base::internal::PerThreadSem::Wait(timespec const*) 
() at base/per-thread-sem.cc:189
#2  0x00000000009e9755 in Mutex::DecrementSynchSem(PerThreadSynch*, timespec*) 
() at base/synchronization.cc:622
---Type <return> to continue, or q <return> to quit---
#3  0x00000000009e856d in CondVar::WaitGeneric(void*, timespec*, void 
(*)(void*), void (*)(void*)) () at base/synchronization.cc:2655
#4  0x00000000009e88a0 in CondVar::Wait(Mutex*) () at 
base/synchronization.cc:2684
#5  0x000000000041c029 in leveldb::port::CondVar::Wait() () at 
./storage/leveldb/port/port_google3.h:61
#6  0x000000000041ab51 in leveldb::DBImpl::MakeRoomForWrite(bool) () at 
storage/leveldb/db/db_impl.cc:1321
#7  0x000000000041a5c5 in leveldb::DBImpl::Write(leveldb::WriteOptions const&, 
leveldb::WriteBatch*) () at storage/leveldb/db/db_impl.cc:1178
#8  0x0000000000412136 in cumulus::LevelDbWriteTool::WriteKeysBatch(int) () at 
cumulus/leveldb/leveldb_write_tool.cc:82
#9  0x000000000041166b in main () at cumulus/leveldb/leveldb_write_tool.cc:263

Original issue reported on code.google.com by r...@google.com on 11 Jun 2014 at 9:47

GoogleCodeExporter commented 9 years ago
You don't happen to have a standalone reproduction case, do you?

Original comment by dgrogan@chromium.org on 17 Jun 2014 at 11:04