Closed casey closed 2 years ago
Not a lot of thought. My thinking was that a 2x improvement in commit latency for small transactions was better than the ~20% improvement in write throughput. It's so workload dependent that neither is obviously better.
I'm working on a benchmark for ord
, I'll definitely compare the two when I do.
Lemme know what your benchmark shows. I ran my own configured to be similar to the ord write pattern, and I think Checksum
is a better default. It's faster for single writes, and everything else looks like it's within noise. Plus the checksums make it possible to check the database for corruption.
Checksum:
redb: Bulk loaded 16000000 items in 383660ms
redb: Wrote 100 individual items in 617ms
redb: Wrote 100 x 1000 items in 11782ms
redb: Random read 16000000 items in 1123403ms
redb: Random range read 16000000 starts in 1095588ms
redb: Removed 8000000 items in 1681512ms
2PC:
redb: Bulk loaded 16000000 items in 368507ms
redb: Wrote 100 individual items in 974ms
redb: Wrote 100 x 1000 items in 12451ms
redb: Random read 16000000 items in 1166113ms
redb: Random range read 16000000 starts in 1149516ms
redb: Removed 8000000 items in 1722359ms
I haven't done a proper benchmark, but the ord
tests run a bit faster with WriteStrategy::Checksum
, so I'm switching to that. I'm not sure about how it will affect indexing performance though. In particular, I'm worried that the checksum strategy is faster for tests, because writes are very small, but might be slower when we sync large blocks, and we are reading and writing huge amounts of ordinal ranges. Do you have a feeling about at what size writes TwoPhase
starts to outperform Checksum
?
A few MB probably. I think my benchmark used 1.2kB values, and even with 1000 inserts per transaction checksum is faster.
Interesting, that's good to keep in mind. I think once we hit steady-state, WriteStrategy::TwoPhase
will wind up being faster, so maybe I'll only enable WriteStrategy::Checksum
for tests.
Have you given thought to what the default write strategy should be? I lean towards
WriteStrategy::Throughput
, since it doesn't have the caveat about a malicious workload, and people are likely not to read the docs.