Open jbowens opened 1 year ago
Hi @jbowens, please add a C-ategory label to your issue. Check out the label system docs.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is otan.
cc @erikgrinaker, heads up that the MVCCScan benchmark (especially with smaller version counts) kinda sucks.
The MVCC benchmarks in the storage package build up an initial state of a database to run against. These databases are very unrepresentative of real-world LSMs.
For example, the 100,000 key, 1 version-per-key, 64-byte value variant produced a LSM consisting of just 2 sstables in L6, resulting in a read amplification of 1. The 100,000 key, 100 versions-per-key, 64-byte value variant produced a slightly more representative LSM with 3 non-empty levels, L6, L5 and L0. L0 had two sublevels, resulting in a read amplification of 4.
Building truly representative LSMs would be prohibitively slow, but using smaller target file sizes or carefully constructing the database to force additional LSM levels would likely have performance characteristics more inline with a realistic Cockroach LSM than we currently have.
Once we have a corpus of compaction benchmarking workloads (cockroachdb/pebble#1865), including initial LSMs, we could switch to relying more on microbenchmarks that we run manually with a mounted, pre-collected LSM.
Jira issue: CRDB-22532