Closed dan-da closed 8 months ago
I'm excited about this because it can reduce the number of locks, and because we switch to a well-tested implementation! If you have the chance, I would love to see one or two benchmarks comparing performance.
Number one priority for me is to preserve the atomicity that we have on DB writes. But from what I can see that is unchanged by these changes.
Yes, I've started a 2nd pass that is aimed at removing unneeded locks and simplification. For example, do we need a read cache, when leveldb has one? We can pass a leveldb cache object of any size we choose in the options when creating DB. For any remaining locks, can they be RwLock instead of Mutex? Part of the 2nd pass will be writing some tests for concurrency and speed. As I understand it, DbtVec is essentially a read/write cache for a db-backed vec, whereas DatabaseVec is a db-backed vec with no app-level cache. I'd like to benchmark these against eachother to quantify the differences. And also benchmark the C++ leveldb against the rust-crate (on-disk db). Concurrency (atomicity) tests should help ensure we don't have regressions on that front.
I made some benchmarks for the database. Below are numbers for rs-leveldb (C++) using the database get/put/delete APIs directly, and trying out the different read/write options. I also tried changing some DB creation options, but they didn't seem to have much effect.
Running benches/database.rs (/home/danda/dev/neptune/twenty-first/target/release/deps/database-58fb112e1804b0f1)
Timer precision: 27 ns
database fastest │ slowest │ median │ mean │ samples │ iters
╰─ leveldb │ │ │ │ │
├─ read_100_entries │ │ │ │ │
│ ╰─ gets │ │ │ │ │
│ ├─ get_each_entry_1_time │ │ │ │ │
│ │ ├─ fill_cache │ │ │ │ │
│ │ │ ├─ no_verify_checksums │ │ │ │ │
│ │ │ │ ╰─ get 35.04 µs │ 97.35 µs │ 35.19 µs │ 35.73 µs │ 1000 │ 1000
│ │ │ ╰─ verify_checksums │ │ │ │ │
│ │ │ ╰─ get 35.04 µs │ 194.3 µs │ 35.22 µs │ 37.89 µs │ 1000 │ 1000
│ │ ╰─ no_fill_cache │ │ │ │ │
│ │ ├─ no_verify_checksums │ │ │ │ │
│ │ │ ╰─ get 47.19 µs │ 190.2 µs │ 76.9 µs │ 89.28 µs │ 1000 │ 1000
│ │ ╰─ verify_checksums │ │ │ │ │
│ │ ╰─ get 51.21 µs │ 199.4 µs │ 82.18 µs │ 95.32 µs │ 1000 │ 1000
│ ╰─ get_each_entry_20_times │ │ │ │ │
│ ├─ fill_cache │ │ │ │ │
│ │ ├─ no_verify_checksums │ │ │ │ │
│ │ │ ╰─ get 674.9 µs │ 2.226 ms │ 677.1 µs │ 738.7 µs │ 1000 │ 1000
│ │ ╰─ verify_checksums │ │ │ │ │
│ │ ╰─ get 674.9 µs │ 3.15 ms │ 677.1 µs │ 746.8 µs │ 1000 │ 1000
│ ╰─ no_fill_cache │ │ │ │ │
│ ├─ no_verify_checksums │ │ │ │ │
│ │ ╰─ get 674.9 µs │ 3.083 ms │ 677.8 µs │ 734.6 µs │ 1000 │ 1000
│ ╰─ verify_checksums │ │ │ │ │
│ ╰─ get 675.8 µs │ 3.255 ms │ 677.8 µs │ 727 µs │ 1000 │ 1000
╰─ write_100_entries │ │ │ │ │
├─ deletes │ │ │ │ │
│ ├─ no_sync_on_write │ │ │ │ │
│ │ ├─ batch_delete 819.5 µs │ 2.236 ms │ 1.223 ms │ 1.27 ms │ 1000 │ 1000
│ │ ├─ batch_delete_write 777.1 µs │ 53.41 ms │ 1.144 ms │ 1.264 ms │ 1000 │ 1000
│ │ ╰─ delete 267.9 µs │ 2.364 ms │ 292.9 µs │ 348.8 µs │ 1000 │ 1000
│ ╰─ sync_on_write │ │ │ │ │
│ ├─ batch_delete 781.1 µs │ 12.06 ms │ 1.207 ms │ 1.274 ms │ 1000 │ 1000
│ ├─ batch_delete_write 823.2 µs │ 2.277 ms │ 1.229 ms │ 1.271 ms │ 1000 │ 1000
│ ╰─ delete 86.67 ms │ 193 ms │ 124.7 ms │ 124.1 ms │ 1000 │ 1000
╰─ puts │ │ │ │ │
├─ no_sync_on_write │ │ │ │ │
│ ├─ batch_put 130.1 µs │ 1.066 s │ 51.12 ms │ 62.91 ms │ 1000 │ 1000
│ ├─ batch_put_write 46.9 µs │ 2.199 ms │ 90.38 µs │ 102.5 µs │ 1000 │ 1000
│ ╰─ put 306.2 µs │ 1.581 ms │ 327.8 µs │ 337.3 µs │ 1000 │ 1000
╰─ sync_on_write │ │ │ │ │
├─ batch_put 1.313 ms │ 323.3 ms │ 67.23 ms │ 69.12 ms │ 1000 │ 1000
├─ batch_put_write 957.7 µs │ 57.72 ms │ 2.2 ms │ 2.698 ms │ 1000 │ 1000
╰─ put 82.35 ms │ 921.1 ms │ 126.9 ms │ 134.8 ms │ 1000 │ 1000
Observations:
It would be interesting to compare these with rusty_leveldb, but I would need to merge to master and port the API changes. Not a big deal, but still I'm not sure if I will find time for that... low priority.
A more useful thing will be similar tests using DbtVec
. With those as a baseline, we can see how changes affect performance.
Here are some bench results for DbtVec:
db_dbtvec fastest │ slowest │ median │ mean │ samples │ iters
├─ read_100_entries │ │ │ │ │
│ ├─ get_each_entry_1_time │ │ │ │ │
│ │ ├─ get_many_persisted 56.94 µs │ 93.66 µs │ 57.54 µs │ 58.48 µs │ 1000 │ 1000
│ │ ├─ get_many_unpersisted 15.5 µs │ 38.01 µs │ 16 µs │ 17.11 µs │ 1000 │ 1000
│ │ ├─ get_persisted 45.47 µs │ 94.32 µs │ 45.56 µs │ 46.83 µs │ 1000 │ 1000
│ │ ╰─ get_unpersisted 7.85 µs │ 27.96 µs │ 8.181 µs │ 8.402 µs │ 1000 │ 1000
│ ╰─ get_each_entry_20_times │ │ │ │ │
│ ├─ get_many_persisted 1.158 ms │ 2.254 ms │ 1.163 ms │ 1.186 ms │ 1000 │ 1000
│ ├─ get_many_unpersisted 315.4 µs │ 573 µs │ 317.2 µs │ 321.5 µs │ 1000 │ 1000
│ ├─ get_persisted 905 µs │ 1.663 ms │ 906.8 µs │ 920.2 µs │ 1000 │ 1000
│ ╰─ get_unpersisted 151.3 µs │ 263.2 µs │ 151.6 µs │ 153.6 µs │ 1000 │ 1000
╰─ write_100_entries │ │ │ │ │
├─ pop │ │ │ │ │
│ ├─ pop 1.741 µs │ 12.25 µs │ 1.849 µs │ 1.874 µs │ 1000 │ 1000
│ ╰─ pop_and_persist 4.443 µs │ 145.3 µs │ 4.784 µs │ 4.93 µs │ 1000 │ 1000
├─ push │ │ │ │ │
│ ├─ push 10.49 µs │ 3.446 ms │ 24.49 µs │ 33.1 µs │ 1000 │ 1000
│ ╰─ push_and_persist 82.15 µs │ 928.2 µs │ 106 µs │ 112.4 µs │ 1000 │ 1000
├─ set │ │ │ │ │
│ ├─ set 47.44 µs │ 133.4 µs │ 69.27 µs │ 71.91 µs │ 1000 │ 1000
│ ╰─ set_and_persist 91.19 µs │ 1.694 ms │ 130.7 µs │ 143.7 µs │ 1000 │ 1000
╰─ set_many │ │ │ │ │
├─ set_many 57.58 µs │ 155.9 µs │ 70.91 µs │ 78.35 µs │ 1000 │ 1000
╰─ set_many_and_persist 95.27 µs │ 1.289 ms │ 134.8 µs │ 155.3 µs │ 1000 │ 1000
Observations:
Here are full results, including leveldb results from the same run. bench-results.txt
Tests are not committed and pushed yet. I plan to do that over the weekend.
Very interesting insights, especially that the batch operations set_many
and get_many
slow things down, the don't speed them up. I would say that you could argue that it's the caller's responsibility that the cache does not grow too big. So far, we've followed the policy that twenty-first
primarily exists to serve the needs of Triton VM and Neptune Core. If anyone else can use twenty-first, great, but that's not its primary objective.
A few questions:
DbtSingletonPrivate
doesn't have a key_prefix
value like DbtVecPrivate
has. Shouldn't it have that to guarantee that key-collissions cannot happen?schema.rs
indicates that restore_or_new
should be called before tables are added to the schema, but that method needs to be called after, right?A few questions:
- I notice that
DbtSingletonPrivate
doesn't have akey_prefix
value likeDbtVecPrivate
has. Shouldn't it have that to guarantee that key-collissions cannot happen?
I don't think that's necessary. See #177.
- Documentation in
schema.rs
indicates thatrestore_or_new
should be called before tables are added to the schema, but that method needs to be called after, right?
yes, the tables must already exist, or else it is a no-op.
I think you are referring to the doctest examples, yes? I will correct them, thx.
Closing this. rs-leveldb with leveldb_sys (C++ lib) dep was merged last month.
This is a tracking issue for the task(s) of switching to C++ impl of levelb.
Some motivating factors for this are:
&mut self
for read calls such as theget()
method. This requires that we use mutex locks, which makes readers block for eachother, and writer also must block for readers.A challenge is that the C++ impl does not support fully in-memory databases. So on-disk database directory is created for each DB, and must somehow be cleaned up for tests, including tests that panic: on purpose or not.
Rough Roadmap: