facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
27.85k stars 6.2k forks source link

Attempt fix rare failure in DBBlockCacheTypeTest.Uncache #12748

Closed pdillinger closed 3 weeks ago

pdillinger commented 3 weeks ago

Summary: I haven't been able to reproduce the failure, seen in https://github.com/facebook/rocksdb/actions/runs/9420830905/job/25953696902?pr=12734

[ RUN      ] DBBlockCacheTypeTestInstance/DBBlockCacheTypeTest.Uncache/2
db/db_block_cache_test.cc:1415: Failure
Expected equality of these values:
  cache->GetOccupancyCount()
    Which is: 37
  kBaselineCount + kNumDataBlocks + meta_blocks_per_file
    Which is: 15
Google Test trace:
db/db_block_cache_test.cc:1346: ua=10000
db/db_block_cache_test.cc:1344: partitioned=1
db/db_block_cache_test.cc:1418: Failure
...

But it's consistent with a SuperVersion reference sticking around beyond the CompactRange, as I can reproduce the result with a dangling Iterator. Like some other tests have had trouble with periodic stats popping up randomly, I suspect that could be the explanation in this case.

Test Plan: Watch for similar future failures

facebook-github-bot commented 3 weeks ago

@pdillinger has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

facebook-github-bot commented 3 weeks ago

@pdillinger merged this pull request in facebook/rocksdb@68112b3beb885c9ec8bc410e15b05e7e27e3c9ee.

ajkr commented 3 weeks ago

@pdillinger Try this patch - it repros:

diff --git a/db/db_impl/db_impl_compaction_flush.cc b/db/db_impl/db_impl_compaction_flush.cc
index f2ce88a86..b1f06011c 100644
--- a/db/db_impl/db_impl_compaction_flush.cc
+++ b/db/db_impl/db_impl_compaction_flush.cc
@@ -3432,6 +3432,8 @@ void DBImpl::BackgroundCallCompaction(PrepickedCompaction* prepicked_compaction,
     if (job_context.HaveSomethingToClean() ||
         job_context.HaveSomethingToDelete() || !log_buffer.IsEmpty()) {
       mutex_.Unlock();
+      bg_cv_.SignalAll();
+      sleep(1);
       // Have to flush the info logs before bg_compaction_scheduled_--
       // because if bg_flush_scheduled_ becomes 0 and the lock is
       // released, the deconstructor of DB can kick in and destroy all the
jaykorean commented 3 weeks ago

Some recent failures for reference

https://github.com/facebook/rocksdb/actions/runs/9507540946/job/26207184986

https://github.com/facebook/rocksdb/actions/runs/9507540946/job/26207185803