Congyuwang / RocksDict

Python fast on-disk dictionary / RocksDB & SpeeDB Python binding
https://congyuwang.github.io/RocksDict/rocksdict.html
MIT License
173 stars 8 forks source link

Code refactor db multithreaded #101

Closed Congyuwang closed 9 months ago

Congyuwang commented 9 months ago

Code refactor: replace DB with DB to simplify code.

GodTamIt commented 9 months ago

@Congyuwang do you happen to have any numbers for single vs. multi-threading before and after this change?

Congyuwang commented 9 months ago

@GodTamIt Hi. I initially attempted to let go of the GIL when calling rocksdb APIs, but the performance is usually worse (tested with 4 threads on M2 MacBook Air):

rocksdict v0.3.19:

Gen rand bytes...
Benchmarking Rdict Put...
Put performance: 1048576 items in 3.0940439701080322 seconds
Put performance multi-thread: 1048576 items in 3.132617950439453 seconds
Benchmarking Rdict Iterator...
Iterator performance: 1048576 items in 0.44229578971862793 seconds
Iterator performance multi-thread: 4194304 items in 1.3302440643310547 seconds
Benchmarking Rdict Get...
Get performance: 1048576 items in 2.7354938983917236 seconds
Get performance multi-thread: 1048576 items in 2.5576398372650146 seconds

rocksdict branch:allow-threads

Gen rand bytes...
Benchmarking Rdict Put...
Put performance: 1048576 items in 3.0238168239593506 seconds
Put performance multi-thread: 1048576 items in 4.2085816860198975 seconds
Benchmarking Rdict Iterator...
Iterator performance: 1048576 items in 0.5370612144470215 seconds
Iterator performance multi-thread: 4194304 items in 12.342989921569824 seconds
Benchmarking Rdict Get...
Get performance: 1048576 items in 2.7959401607513428 seconds
Get performance multi-thread: 1048576 items in 3.1259820461273193 seconds

If you wish to benchmark it on other platforms / hardware, you might want to checkout branch allow-threads. The benchmark code is in test/ directory: test/bench_rdict.py.

I guess it's just lock contention. So, I reverted those changes. But I kept the DB<Multithreaded> struct instead of RefCell<DB<Singlethreaded>> for cleaner code. Basically not much change, except for some code cleaning work.