Congyuwang / RocksDict

Python fast on-disk dictionary / RocksDB & SpeeDB Python binding
https://congyuwang.github.io/RocksDict/rocksdict.html
MIT License
176 stars 8 forks source link

Scalability/Performance #41

Closed spillz closed 1 year ago

spillz commented 1 year ago

Not really an issue per se but wasn't sure where else to ask. Perhaps these could be part of a FAQ.

  1. Key limit. What is the scalability like in terms of # of keys. E.g., if I add 100M keys to a rocksdict, will it fall over?
  2. Network shareable. If the dict folder is stored on a network share, can it be safely read by multiple concurrent users? What about writes?
  3. Cold boot. Is there a significant performance penalty with opening the database or fetching a key for the first time as it grows in size? I see this with semidbm even on a moderate size db of ~300MB and 5M keys but so far so good with rocksdict.
  4. Duplicate handling. For example, if I am storing long pathnames in my values and there are many duplicates do I need to manually assign those strings (unique_id,path) pairs in the DB manually (and look them up to retrieve them) to avoid propagating those duplicates or this is taken care of in the lib (or by python itself)?
  5. Raw mode. I would imagine there might be some storage efficiency gained by using raw mode over pickle. True?
  6. Writes. I could be wrong but write speed does some slightly slower than semidbm on a moderately large db. Even if true, it's probably worth the tradeoff in terms of read speed for most applications. What should I expect here and does it get worse with size?

Many thanks for making the lib!