facebook / rocksdb

A library that provides an embeddable, persistent key-value store for fast storage.
http://rocksdb.org
GNU General Public License v2.0
27.83k stars 6.2k forks source link

Rocksdb secondary instance error shows "Corruption: MANIFEST record referencing unknown column family The file may be corrupted." when calling TryCatchUpWithPrimary #12821

Open chloeyin opened 6 days ago

chloeyin commented 6 days ago

Note: Please use Issues only for bug reports. For questions, discussions, feature requests, etc. post to dev group: https://groups.google.com/forum/#!forum/rocksdb or https://www.facebook.com/groups/rocksdb.dev

While I'm using a Golang wrapper for Rocksdb, I think it is related to the rocksdb itself and I can reproduce it using C# wrapper as well, not sure if it is by designed.

Expected behavior

No error happens

Actual behavior

Corruption: MANIFEST record referencing unknown column family The file may be corrupted

Steps to reproduce the behavior

For all the steps I use default options. (Except for setting create_if_missing)

  1. Open a Rocksdb with some columns (create if missing)
  2. Open a secondary Rocksdb, along with the same column opened
  3. Create a new column through primary instance
  4. Catch up the primary ---> In this step everything is OK
  5. Close the primary instance and open it again.
  6. Use the same secondary instance opened in Step2 and call TryCatchUpWithPrimary
  7. Error Corruption: MANIFEST record referencing unknown column family The file may be corrupted. thrown
func TestSeconary(t *testing.T) {
    opts := grocksdb.NewDefaultOptions()
    opts.SetCreateIfMissing(true)
    opts.SetCreateIfMissingColumnFamilies(true)
    db, _, err := grocksdb.OpenDbColumnFamilies(opts, "/tmp/abc", []string{"cf1", "default"}, []*grocksdb.Options{opts, opts})
    if err != nil {
        t.Fatal(err.Error())
    }

    secondary, _, err := grocksdb.OpenDbAsSecondaryColumnFamilies(opts, "/tmp/abc", "/tmp/abc-2", []string{"cf1", "default"}, []*grocksdb.Options{opts, opts})
    if err != nil {
        t.Fatal(err.Error())
    }

    if _, err := db.CreateColumnFamily(opts, "cf2"); err != nil {
        t.Fatal(err.Error())
    }

    if err := secondary.TryCatchUpWithPrimary(); err != nil {
        t.Fatal(err.Error())
    }

    db.Close()

    if err := secondary.TryCatchUpWithPrimary(); err != nil {
        t.Fatal(err.Error())
    }

    db, _, err = grocksdb.OpenDbColumnFamilies(opts, "/tmp/abc", []string{"cf1", "cf2", "default"}, []*grocksdb.Options{opts, opts, opts})
    if err != nil {
        t.Fatal(err.Error())
    }

    if err := secondary.TryCatchUpWithPrimary(); err != nil {
        t.Fatal(err.Error())
    }
}

While I'm still reading the code, is there any suggestions I can get about this error? I know secondary cannot catch up newly created columns, but throwing this error is not expected from my side. Thank in advanced!

ajkr commented 5 days ago

That is a good point - thanks for the clear repro. I think we should try to make the secondary consistently ignore the column family it doesn't know about, even when the primary reopens or the MANIFEST rolls over.

chloeyin commented 3 days ago

I had a C++ reproduce

#include <rocksdb/db.h>
#include <iostream>
#include <vector>

#define rlog [](rocksdb::Status s) {std::cout << __FILE__ << ":" << __LINE__ << " " << s.ToString() << std::endl;}

int main() {
  rocksdb::DB* db;
  rocksdb::Options db_options;
  db_options.create_if_missing = true;
  db_options.create_missing_column_families = true;

  rocksdb::Status s;

  rocksdb::ColumnFamilyOptions cf_opts;

  std::vector<rocksdb::ColumnFamilyDescriptor> cfs = {{"cf1", cf_opts}, {"default", cf_opts}};

  std::vector<rocksdb::ColumnFamilyHandle*> cf_handles;

  // Open new db with some columns
  s = rocksdb::DB::Open(db_options, "/tmp/testdb", cfs, &cf_handles, &db);
  rlog(s);

  // Open a secondary db
  rocksdb::DB* db2;
  std::vector<rocksdb::ColumnFamilyHandle*> handles_secondary;
  s = rocksdb::DB::OpenAsSecondary(db_options, "/tmp/testdb", "/tmp/testdb-2", cfs, &handles_secondary, &db2);
  rlog(s);

  // Create a new column handles
  rocksdb::ColumnFamilyHandle* new_handle;
  s = db->CreateColumnFamily(cf_opts, "cf2", &new_handle);
  rlog(s);

  // Catch up
  s = db2->TryCatchUpWithPrimary();
  rlog(s);

  // Release db
  for(auto handle: cf_handles) {
    s = db->DestroyColumnFamilyHandle(handle);
  }
  db->DestroyColumnFamilyHandle(new_handle);
  delete db;
  rlog(s);

  // Catch up
  s = db2->TryCatchUpWithPrimary();
  rlog(s);

  // Open primary again
  rocksdb::DB* db_again;
  std::vector<rocksdb::ColumnFamilyDescriptor> cfs_again = {{"cf1", cf_opts}, {"cf2", cf_opts}, {"default", cf_opts}};
  std::vector<rocksdb::ColumnFamilyHandle*> cf_handles_again;
  s = rocksdb::DB::Open(db_options, "/tmp/testdb", cfs_again, &cf_handles_again, &db_again);
  rlog(s);

  // Catch up primary  --> error here
  s = db2->TryCatchUpWithPrimary();
  rlog(s);
}
g++ main.cpp -g -lrocksdb -o testcpp
main.cpp:23 OK
main.cpp:29 OK
main.cpp:34 OK
main.cpp:38 OK
main.cpp:46 OK
main.cpp:50 OK
main.cpp:57 OK
main.cpp:61 Corruption: MANIFEST record referencing unknown column family  The file /tmp/testdb/MANIFEST-000011 may be corrupted.

Tried to build Rocksdb in release and debug mode, both returned this error >_<.

My Rocksdb version: 8.8.1

chloeyin commented 3 days ago

By some readings, I found that there is a version edit whose has_last_sequence_=true and has_log_number_=true for the new colum created(I'm not sure what this version edit is used for), and this is the version edit that fails the catch up. https://github.com/facebook/rocksdb/blob/986b8b9f20893dec811d8ecdb97b6a47f20d322d/db/version_edit.cc#L530-L536

This function fails thus fails the catch up. https://github.com/facebook/rocksdb/blob/986b8b9f20893dec811d8ecdb97b6a47f20d322d/db/version_edit_handler.cc#L355-L374

(gdb) info locals
__PRETTY_FUNCTION__ = "void rocksdb::VersionEditHandler::CheckColumnFamilyId(const rocksdb::VersionEdit&, bool*, bool*) const"
in_not_found = false
in_builders = false
(gdb) p edit
$36 = (const rocksdb::VersionEdit &) @0x7fffffffb9c0: {max_level_ = 0, db_id_ = "", comparator_ = "", log_number_ = 4, prev_log_number_ = 0, next_file_number_ = 0, max_column_family_ = 0, min_log_number_to_keep_ = 0, last_sequence_ = 0, has_db_id_ = false,
  has_comparator_ = false, has_log_number_ = true, has_prev_log_number_ = false, has_next_file_number_ = false, has_max_column_family_ = false, has_min_log_number_to_keep_ = false, has_last_sequence_ = true, has_persist_user_defined_timestamps_ = false,
  compact_cursors_ = std::vector of length 0, capacity 0, deleted_files_ = std::set with 0 elements, new_files_ = std::vector of length 0, capacity 0, blob_file_additions_ = std::vector of length 0, capacity 0,
  blob_file_garbages_ = std::vector of length 0, capacity 0, wal_additions_ = std::vector of length 0, capacity 0, wal_deletion_ = {static kEmpty = 0, number_ = 0}, column_family_ = 2, is_column_family_drop_ = false, is_column_family_add_ = false,
  column_family_name_ = "", is_in_atomic_group_ = false, remaining_entries_ = 0, full_history_ts_low_ = "", persist_user_defined_timestamps_ = true}
(gdb) bt
#0  rocksdb::VersionEditHandler::CheckColumnFamilyId (this=this@entry=0x555555655420, edit=..., cf_in_not_found=cf_in_not_found@entry=0x7fffffffb6fe, cf_in_builders=cf_in_builders@entry=0x7fffffffb6ff) at db/version_edit_handler.cc:372
#1  0x00007ffff7a09627 in rocksdb::VersionEditHandler::OnNonCfOperation (this=0x555555655420, edit=..., cfd=0x7fffffffb970) at db/version_edit_handler.cc:296
#2  0x00007ffff7a0bca1 in rocksdb::VersionEditHandler::ApplyVersionEdit (this=0x555555655420, edit=..., cfd=0x7fffffffb970) at db/version_edit_handler.cc:213
#3  0x00007ffff7a0ca92 in rocksdb::ManifestTailer::ApplyVersionEdit (this=0x555555655420, edit=..., cfd=0x7fffffffb970) at db/version_edit_handler.cc:984
#4  0x00007ffff7a06826 in rocksdb::VersionEditHandlerBase::Iterate (this=0x555555655420, reader=..., log_read_status=log_read_status@entry=0x5555556372f0) at db/version_edit_handler.cc:61
#5  0x00007ffff7a2205a in rocksdb::ReactiveVersionSet::ReadAndApply (this=this@entry=0x5555555ef050, mu=mu@entry=0x55555564ba80, manifest_reader=manifest_reader@entry=0x55555564cdd0, manifest_read_status=0x5555556372f0, cfds_changed=cfds_changed@entry=0x7fffffffbc50)
    at /usr/include/c++/11/bits/unique_ptr.h:173
#6  0x00007ffff7929ba6 in rocksdb::DBImplSecondary::TryCatchUpWithPrimary (this=0x55555564b300) at /usr/include/c++/11/bits/unique_ptr.h:173
#7  0x000055555555886c in main () at main.cpp:60
(gdb)

So even if secondary instance can ignore the newly created column https://github.com/facebook/rocksdb/blob/986b8b9f20893dec811d8ecdb97b6a47f20d322d/db/version_edit_handler.cc#L994-L1008 when replaying the manifest version set, it can still fail here for the above mentioned version edit https://github.com/facebook/rocksdb/blob/986b8b9f20893dec811d8ecdb97b6a47f20d322d/db/version_edit_handler.cc#L292-L306

Looks like this is not something I can bypass >_<

But why it only happens when I close the database, does it flush some new data to the manifest?