Closed cppdev-123 closed 1 year ago
This is the function that collects the metadata:
// LevelDBHandle, LevelDBIterator and LevelDBSnapshot are just unique_ptr over leveldb::* types
void collect_metadata(LevelDBHandle& ldb, std::vector<IdbDatabase>& dbs) {
dbs.clear();
const leveldb::Slice global_meta_prefix{ "\0\0\0\0", 4 };
LevelDBSnapshot snapshot{ ldb->GetSnapshot(), LevelDBSnapshotDeleter{ *ldb } };
leveldb::ReadOptions opts;
opts.fill_cache = false;
opts.snapshot = snapshot.get();
//opts.verify_checksums = true;
auto it = LevelDBIterator{ ldb->NewIterator(opts) };
size_t rec_idx = 0;
for (it->SeekToFirst(); it->Valid(); it->Next()) {
auto key = it->key();
auto value = it->value();
if (!key.starts_with(global_meta_prefix)) {
const leveldb::Slice two_0_prefix{ "\0\0\0\0", 4 };
if (it->key().starts_with(two_0_prefix))
printf("!!! a record starts with 2 zeros but not 4 !!!\n");
continue;
}
key.remove_prefix(global_meta_prefix.size());
GlobalMetaKey global_meta_key;
if (!global_meta_key.read_from_buff(key)) {
printf("!!! failed to read global meta record !!!\n");
continue;
}
rec_idx += 1;
printf("[%zu] got a meta record with type: %d\n", rec_idx, (int)global_meta_key.type);
if (global_meta_key.type == GlobalMetaKey::Type::database_name) {
GlobalMetaDatabaseNameValue database_id;
if (!database_id.read_from_buff(value)) {
printf("!!! failed to read database id from global meta database name !!!\n");
continue;
}
IdbDatabase db{ .id = database_id.db_id, .name = utf16_be_to_utf8(global_meta_key.db_name_be),
.domain = utf16_be_to_utf8(global_meta_key.db_origin_be) };
dbs.emplace_back(std::move(db));
}
else if (global_meta_key.type == GlobalMetaKey::Type::database_free_list) {
printf("*** free database id: %d ***\n", (int)global_meta_key.free_db_id);
continue;
}
else if (global_meta_key.type == GlobalMetaKey::Type::max_database_id) {
GlobalMetaMaxDBIdValue max_db_id;
if (!max_db_id.read_from_buff(value)) {
printf("!!! failed to max database id version from global meta !!!\n");
continue;
}
printf("*** max database id: %d ***\n", (int)max_db_id.db_id);
}
else if (global_meta_key.type == GlobalMetaKey::Type::schema_version) {
GlobalMetaSchemaValue schema;
if (!schema.read_from_buff(value)) {
printf("!!! failed to read schema version from global meta !!!\n");
continue;
}
printf("*** schema version: %d ***\n", (int)schema.version);
}
}
}
I don't get any errors but only a few meta records with type 201 (0xc9) so I opened the leveldb files with a hex editor and looked at the beginning of the file and found some databases names with their domains encoding in utf16 BE which I already get but the rest of the databases names isn't there so how does this script find them ??
Hi,
Sorry for the slight delay in replying.
I suspect what is happening (assuming you're using the "official" implementation of leveldb) is that the code is working as expected and the raw records you can see in a hex editor are actually deleted or outdated, so not iterated across. Our code isn't handling the database in quite the same way as the official code because the target audience is forensics; it's recovering all of the out-dated and deleted records and leaving it to the caller to sort by sequence number and omit them if required.
I hope that helps?
it was the a problem in the comparator. I wonder why byte wise comparator didn't do the job
Sorry if this is not related to this library but I'm having problems doing the same for indexeddb in c++ and not getting all the global meta records especially those with type 201 (0xc9) which corresponds to database names and origins. However I get all the database meta records and the global meta max database id record with the correct max database id.
Do I need some special flags when opening the database ? I tested with this python script and it lists all the databases with their names but it seems to use custom implementation of leveldb