Open chloeyin opened 6 days ago
That is a good point - thanks for the clear repro. I think we should try to make the secondary consistently ignore the column family it doesn't know about, even when the primary reopens or the MANIFEST rolls over.
I had a C++ reproduce
#include <rocksdb/db.h>
#include <iostream>
#include <vector>
#define rlog [](rocksdb::Status s) {std::cout << __FILE__ << ":" << __LINE__ << " " << s.ToString() << std::endl;}
int main() {
rocksdb::DB* db;
rocksdb::Options db_options;
db_options.create_if_missing = true;
db_options.create_missing_column_families = true;
rocksdb::Status s;
rocksdb::ColumnFamilyOptions cf_opts;
std::vector<rocksdb::ColumnFamilyDescriptor> cfs = {{"cf1", cf_opts}, {"default", cf_opts}};
std::vector<rocksdb::ColumnFamilyHandle*> cf_handles;
// Open new db with some columns
s = rocksdb::DB::Open(db_options, "/tmp/testdb", cfs, &cf_handles, &db);
rlog(s);
// Open a secondary db
rocksdb::DB* db2;
std::vector<rocksdb::ColumnFamilyHandle*> handles_secondary;
s = rocksdb::DB::OpenAsSecondary(db_options, "/tmp/testdb", "/tmp/testdb-2", cfs, &handles_secondary, &db2);
rlog(s);
// Create a new column handles
rocksdb::ColumnFamilyHandle* new_handle;
s = db->CreateColumnFamily(cf_opts, "cf2", &new_handle);
rlog(s);
// Catch up
s = db2->TryCatchUpWithPrimary();
rlog(s);
// Release db
for(auto handle: cf_handles) {
s = db->DestroyColumnFamilyHandle(handle);
}
db->DestroyColumnFamilyHandle(new_handle);
delete db;
rlog(s);
// Catch up
s = db2->TryCatchUpWithPrimary();
rlog(s);
// Open primary again
rocksdb::DB* db_again;
std::vector<rocksdb::ColumnFamilyDescriptor> cfs_again = {{"cf1", cf_opts}, {"cf2", cf_opts}, {"default", cf_opts}};
std::vector<rocksdb::ColumnFamilyHandle*> cf_handles_again;
s = rocksdb::DB::Open(db_options, "/tmp/testdb", cfs_again, &cf_handles_again, &db_again);
rlog(s);
// Catch up primary --> error here
s = db2->TryCatchUpWithPrimary();
rlog(s);
}
g++ main.cpp -g -lrocksdb -o testcpp
main.cpp:23 OK
main.cpp:29 OK
main.cpp:34 OK
main.cpp:38 OK
main.cpp:46 OK
main.cpp:50 OK
main.cpp:57 OK
main.cpp:61 Corruption: MANIFEST record referencing unknown column family The file /tmp/testdb/MANIFEST-000011 may be corrupted.
Tried to build Rocksdb in release and debug mode, both returned this error >_<.
My Rocksdb version: 8.8.1
By some readings, I found that there is a version edit whose has_last_sequence_=true
and has_log_number_=true
for the new colum created(I'm not sure what this version edit is used for), and this is the version edit that fails the catch up.
https://github.com/facebook/rocksdb/blob/986b8b9f20893dec811d8ecdb97b6a47f20d322d/db/version_edit.cc#L530-L536
This function fails thus fails the catch up. https://github.com/facebook/rocksdb/blob/986b8b9f20893dec811d8ecdb97b6a47f20d322d/db/version_edit_handler.cc#L355-L374
(gdb) info locals
__PRETTY_FUNCTION__ = "void rocksdb::VersionEditHandler::CheckColumnFamilyId(const rocksdb::VersionEdit&, bool*, bool*) const"
in_not_found = false
in_builders = false
(gdb) p edit
$36 = (const rocksdb::VersionEdit &) @0x7fffffffb9c0: {max_level_ = 0, db_id_ = "", comparator_ = "", log_number_ = 4, prev_log_number_ = 0, next_file_number_ = 0, max_column_family_ = 0, min_log_number_to_keep_ = 0, last_sequence_ = 0, has_db_id_ = false,
has_comparator_ = false, has_log_number_ = true, has_prev_log_number_ = false, has_next_file_number_ = false, has_max_column_family_ = false, has_min_log_number_to_keep_ = false, has_last_sequence_ = true, has_persist_user_defined_timestamps_ = false,
compact_cursors_ = std::vector of length 0, capacity 0, deleted_files_ = std::set with 0 elements, new_files_ = std::vector of length 0, capacity 0, blob_file_additions_ = std::vector of length 0, capacity 0,
blob_file_garbages_ = std::vector of length 0, capacity 0, wal_additions_ = std::vector of length 0, capacity 0, wal_deletion_ = {static kEmpty = 0, number_ = 0}, column_family_ = 2, is_column_family_drop_ = false, is_column_family_add_ = false,
column_family_name_ = "", is_in_atomic_group_ = false, remaining_entries_ = 0, full_history_ts_low_ = "", persist_user_defined_timestamps_ = true}
(gdb) bt
#0 rocksdb::VersionEditHandler::CheckColumnFamilyId (this=this@entry=0x555555655420, edit=..., cf_in_not_found=cf_in_not_found@entry=0x7fffffffb6fe, cf_in_builders=cf_in_builders@entry=0x7fffffffb6ff) at db/version_edit_handler.cc:372
#1 0x00007ffff7a09627 in rocksdb::VersionEditHandler::OnNonCfOperation (this=0x555555655420, edit=..., cfd=0x7fffffffb970) at db/version_edit_handler.cc:296
#2 0x00007ffff7a0bca1 in rocksdb::VersionEditHandler::ApplyVersionEdit (this=0x555555655420, edit=..., cfd=0x7fffffffb970) at db/version_edit_handler.cc:213
#3 0x00007ffff7a0ca92 in rocksdb::ManifestTailer::ApplyVersionEdit (this=0x555555655420, edit=..., cfd=0x7fffffffb970) at db/version_edit_handler.cc:984
#4 0x00007ffff7a06826 in rocksdb::VersionEditHandlerBase::Iterate (this=0x555555655420, reader=..., log_read_status=log_read_status@entry=0x5555556372f0) at db/version_edit_handler.cc:61
#5 0x00007ffff7a2205a in rocksdb::ReactiveVersionSet::ReadAndApply (this=this@entry=0x5555555ef050, mu=mu@entry=0x55555564ba80, manifest_reader=manifest_reader@entry=0x55555564cdd0, manifest_read_status=0x5555556372f0, cfds_changed=cfds_changed@entry=0x7fffffffbc50)
at /usr/include/c++/11/bits/unique_ptr.h:173
#6 0x00007ffff7929ba6 in rocksdb::DBImplSecondary::TryCatchUpWithPrimary (this=0x55555564b300) at /usr/include/c++/11/bits/unique_ptr.h:173
#7 0x000055555555886c in main () at main.cpp:60
(gdb)
So even if secondary instance can ignore the newly created column https://github.com/facebook/rocksdb/blob/986b8b9f20893dec811d8ecdb97b6a47f20d322d/db/version_edit_handler.cc#L994-L1008 when replaying the manifest version set, it can still fail here for the above mentioned version edit https://github.com/facebook/rocksdb/blob/986b8b9f20893dec811d8ecdb97b6a47f20d322d/db/version_edit_handler.cc#L292-L306
Looks like this is not something I can bypass >_<
But why it only happens when I close the database, does it flush some new data to the manifest?
While I'm using a Golang wrapper for Rocksdb, I think it is related to the rocksdb itself and I can reproduce it using C# wrapper as well, not sure if it is by designed.
Expected behavior
No error happens
Actual behavior
Corruption: MANIFEST record referencing unknown column family The file may be corrupted
Steps to reproduce the behavior
For all the steps I use default options. (Except for setting create_if_missing)
TryCatchUpWithPrimary
While I'm still reading the code, is there any suggestions I can get about this error? I know secondary cannot catch up newly created columns, but throwing this error is not expected from my side. Thank in advanced!