cberner / redb

An embedded key-value database in pure Rust
https://www.redb.org
Apache License 2.0
3.28k stars 153 forks source link

WriteStrategy name #342

Closed casey closed 2 years ago

casey commented 2 years ago

I'm trying to figure out how to reproduce #337, and had the thought that changing the WriteStrategy name to be based on the algorithm might be a good idea. There are potentially a lot of different write strategies, and if they make different trade offs, CommitLatency and Throughput might wind up being not specific enough. Also, CommitLatency and Throughput are a little opaque.

Another thought is that CommitStrategy might be better than WriteStrategy, because it's used on commit, not on write. That is to say, all writes are done the same, it's the algorithm used to commit those writes that's different. And then

I was thinking:

enum CommitStrategy {
    OnePhaseWithChecksum,
    TwoPhase,
}

What do you think? Happy to open a PR if you like the idea.

This is also partially based on the Rust standard library collections std::collections::{BTreeMap, HashMap}. I think these are better names than C++'s map, and unordered_map, since they tell you exactly how they're implemented, which lets you reason better about which one to use.

cberner commented 2 years ago

Ya, I considered using names like that, but my reasoning was that very few people would know what the tradeoff of OnePhaseWithChecksum vs TwoPhase are without reading the documentation. Whereas, it's more obvious to users that they want to optimize for throughput over commit latency.

Another thought is that CommitStrategy might be better than WriteStrategy, because it's used on commit, not on write. That is to say, all writes are done the same, it's the algorithm used to commit those writes that's different. And then

Actually it's both! When using 1PC+C every write also involves updating the checksums recursively to the root, whereas with 2PC it does not.

casey commented 2 years ago

Ya, I considered using names like that, but my reasoning was that very few people would know what the tradeoff of OnePhaseWithChecksum vs TwoPhase are without reading the documentation.

I actually think that might be ideal. It might be better for them to need to read the docs, instead of making assumptions about what the tradeoffs are. And since the docs can be doc-comments on both enum variants, they're very easy to access.

Actually it's both! When using 1PC+C every write also involves updating the checksums recursively to the root, whereas with 2PC it does not.

Ahh gotcha.

cberner commented 2 years ago

I actually think that might be ideal. It might be better for them to need to read the docs, instead of making assumptions about what the tradeoffs are. And since the docs can be doc-comments on both enum variants, they're very easy to access.

Ok ya, I'm convinced :) The performance difference should only really matter for people who really care about perf, at which point they should be reading the docs anyway. Want to send a PR?