cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.1k stars 3.8k forks source link

storage: smooth pebble compaction #133948

Open andrewbaptist opened 1 day ago

andrewbaptist commented 1 day ago

In various scenarios, while pebble compactions (including memtable -> L0) are running the latency for operations increases significantly. This is particularly noticeable on small clusters, and most noticeable on a single node cluster. In these situations each node will oscillate between "fast" (not running compaction) and "slow" (running compaction) with periods on the order of 10-60s.

Proposed solution: Attempt to smooth the compaction by detecting scenarios where there are idle periods between compactions. By smoothing the compaction, the system will always run at "medium" rate rather than oscillating between fast and slow. This has two primary benefits: 1) From the end user perspective, the latency is flatter. 2) The total throughput should increase for "open loop" workloads with fixed concurrency.

References: Internal slack discussion Presentation on latency https://github.com/cockroachdb/pebble/pull/2004

Jira issue: CRDB-43849

jbowens commented 1 day ago

We should be careful not to over index on small node configurations that are not the expected deployment of Cockroach.

jbowens commented 11 hours ago

See cockroachdb/pebble#687

andrewbaptist commented 9 hours ago

I think there is a big difference between "pacing" and "smoothing". The goal of the patch with smoothing is to only inject delays during a compaction if we will very likely sleep AFTER the compaction. The goal is not to protect the disk or its throughput at all, it is simply to spread "compaction sleep time" evenly across the preceding compaction. If we start running without compaction sleep time, then the smoothing is fully disabled.

jbowens commented 7 hours ago

I'm not seeing the distinction. The objective of pacing is to stabilize foreground latencies by pacing (or, synonymously, smoothing) compactions to spread their resource utilization (both cpu & disk) over time. We're not attempting to alter the overall compaction throughput.