Congyuwang / RocksDict

Python fast on-disk dictionary / RocksDB & SpeeDB Python binding
https://congyuwang.github.io/RocksDict/rocksdict.html
MIT License
173 stars 8 forks source link

One writer, many readers #110

Closed WhattaSkilL closed 6 months ago

WhattaSkilL commented 6 months ago

Hello! Try to use that library for kv store in FastAPI app. Workers from ginicorn connect with AccessType.read_only(False) and Primary connect with AccessType.read_write(). Writer:

opt = Options(raw_mode=True)
opt.set_max_background_jobs(4)
opt.set_write_buffer_size(1024 * 1024 * 256)
opt.create_if_missing(True)
opt.set_keep_log_file_num(1)
opt.set_db_log_dir(logs_path)
opt.set_optimize_filters_for_hits(True)
opt.optimize_for_point_lookup(1024)
opt.set_max_open_files(1000)
opt.set_wal_dir(wal_path)
opt.set_wal_size_limit_mb(100)
opt.set_wal_ttl_seconds(180)
opt.set_max_total_wal_size(67108864)
opt.set_wal_recovery_mode(DBRecoveryMode.absolute_consistency())
db = Rdict(data_path, options=opt, access_type=AccessType.read_write())

Reader:

opt = Options(raw_mode=True)
opt.set_max_background_jobs(4)
opt.set_write_buffer_size(1024 * 1024 * 256)
opt.create_if_missing(True)
opt.set_keep_log_file_num(1)
opt.set_db_log_dir(logs_path)
opt.set_optimize_filters_for_hits(True)
opt.optimize_for_point_lookup(1024)
opt.set_max_open_files(1000)
opt.set_wal_dir(wal_path)
opt.set_wal_size_limit_mb(100)
opt.set_wal_ttl_seconds(180)
opt.set_max_total_wal_size(67108864)
opt.set_wal_recovery_mode(DBRecoveryMode.absolute_consistency())
db = Rdict(data_path, options=opt, access_type=AccessType.read_only(False))

So, when I change the values, readers don't see the changes without re-opening the database. As example: Write process:

db[bytes.fromhex("abcd12")] = 1.to_bytes(1, "little")
db.flush()
db.flush_wal()

Read process:

int.from_bytes(db[bytes.fromhex("abcd12")], "little") # not found
db = Rdict(data_path, options=opt, access_type=AccessType.read_only(False))
int.from_bytes(db[bytes.fromhex("abcd12")], "little") # found: 1

What am I doing wrong?

Congyuwang commented 6 months ago

Hi, I found the following from rocksdb doc:

RocksDB database can be opened in read-write mode (aka. Primary Instance) or can be opened in read-only mode. RocksDB supports two variations of read-only mode:

Read-only Instance - Opens the database in read-only mode. When the Read-only instance is created, it gets a static read-only view of the Primary Instance’s database contents Secondary Instance – Opens the database in read-only mode. Supports extra ability to dynamically catch-up with the Primary instance (through a manual call by the user – based on their delay/frequency requirements)

https://github.com/facebook/rocksdb/wiki/Read-only-and-Secondary-instances

So, the read only mode gets a static view. You should use the secondary instance.

Congyuwang commented 6 months ago

Example from rocksdb doc:

const std::string kDbPath = "/tmp/rocksdbtest";
...
// Assume we have already opened a regular RocksDB instance db_primary
// whose database directory is kDbPath.
assert(db_primary);

Options options;
options.max_open_files = -1;

// Secondary instance needs its own directory to store info logs (LOG)
const std::string kSecondaryPath = "/tmp/rocksdb_secondary/";
DB* db_secondary = nullptr;

Status s = DB::OpenAsSecondary(options, kDbPath, kSecondaryPath, &db_secondary);
assert(!s.ok() || db_secondary);

// Let secondary **try** to catch up with primary
s = db_secondary->TryCatchUpWithPrimary();
assert(s.ok());

// Read operations
std::string value;
s = db_secondary->Get(ReadOptions(), "foo", &value);
...

for Python code use

opt = Options(raw_mode=True)
opt.set_max_open_files(-1)
db = Rdict(db_path, options=opt, access_type=AccessType.Secondary(some_path_for_secondary_log))

# need to catch up at some frequency manually by calling:
db.try_catch_up_with_primary()
WhattaSkilL commented 6 months ago

Yah, thanks you! I saw that but I didn't understand by

If the writes on Primary Instance does not have WAL enabled (WriteOptions.disableWAL == true), the Read-only/Secondary
 Instances will not have visibility of data residing in Primary’s memtables – resulting in partial view of the database.

and try to enable wal, or try to undestand why not working good try_catch_up_with_primary is 7 times better than re-open for cpu perfomance, but thats not free too use any way. When i use reopen, i check rocks directory modify time. Maybe better way to check changes in primary?

Congyuwang commented 6 months ago

So, for rocksdb, part of the data is written to those SST files, and part of them are not flushed (stored in memtable), but those data are in the WAL log. So, since the secondary or the readonly instance is another process, it does not have access to the memtable of the primary instance (which is in memory). So, it loads its view from SST files + WAL files. If WAL is not enabled, the view can be more partial than if it is enabled.