facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
28.09k stars 6.25k forks source link

Some checkpoints cannot be opened with `kAbsoluteConsistency` WAL recovery mode #12670

Open andlr opened 3 months ago

andlr commented 3 months ago

Expected behavior

Database can be opened from a checkpoint with wal_recovery_mode=kAbsoluteConsistency

Actual behavior

Due to a few data race issues, sometimes active WAL file gets copied in inconsistent state. Database open fails with one of these errors when wal_recovery_mode=kAbsoluteConsistency:

Steps to reproduce the behavior

Initially I wrote this heavy and flaky test, which sometimes reproduces this issue:


TEST_F(CheckpointTest, WalCorruption) {
  Options options = CurrentOptions();
  options.wal_recovery_mode = WALRecoveryMode::kAbsoluteConsistency;

  Reopen(options);

  const auto threads_num = 32;
  const auto checkpoints_to_create = 200;
  std::atomic<int> thread_num(0);
  std::vector<port::Thread> threads;
  port::RWMutex mutex;
  bool finished = false;

  std::function<void()> write_func = [&]() {
    int a = thread_num.fetch_add(1);
    bool stop_worker = false;

    while (!stop_worker) {
      for (auto i = 0; i < 10000; ++i) {
        std::string key = "foo" + std::to_string(a) + "_" + std::to_string(i);
        ASSERT_OK(Put(key, "bar"));
      }

      mutex.ReadLock();
      stop_worker = finished;
      mutex.ReadUnlock();
    }
  };

  for (auto i = 0; i < threads_num; ++i) {
    threads.emplace_back(write_func);
  }

  std::vector<std::string> snapshot_names;
  for (auto i = 0; i < checkpoints_to_create; ++i) {
    const auto snapshot_name =
        test::PerThreadDBPath(env_, "snap_" + std::to_string(i));
    std::unique_ptr<Checkpoint> checkpoint;
    Checkpoint* checkpoint_ptr;
    ASSERT_OK(Checkpoint::Create(db_, &checkpoint_ptr));
    checkpoint.reset(checkpoint_ptr);

    ASSERT_OK(checkpoint->CreateCheckpoint(snapshot_name));
    snapshot_names.push_back(snapshot_name);
  }

  mutex.WriteLock();
  finished = true;
  mutex.WriteUnlock();

  for (auto& t : threads) {
    t.join();
  }

  Close();

  options.skip_stats_update_on_db_open = true;
  options.skip_checking_sst_file_sizes_on_db_open = true;
  options.max_open_files = 10;

  for (const auto& snapshot_name : snapshot_names) {
    DB* snapshot_db = nullptr;
    ASSERT_OK(DB::Open(options, snapshot_name, &snapshot_db));
    ASSERT_OK(snapshot_db->Close());
    delete snapshot_db;
  }
}

But I've also wrote more precise unit tests using sync points, so I'll include them into my PR with a suggested fix.

Conditions to reproduce are:

This happens because size of the active WAL file is captured at a random moment: