Open andrewbaptist opened 1 day ago
We should be careful not to over index on small node configurations that are not the expected deployment of Cockroach.
See cockroachdb/pebble#687
I think there is a big difference between "pacing" and "smoothing". The goal of the patch with smoothing is to only inject delays during a compaction if we will very likely sleep AFTER the compaction. The goal is not to protect the disk or its throughput at all, it is simply to spread "compaction sleep time" evenly across the preceding compaction. If we start running without compaction sleep time, then the smoothing is fully disabled.
I'm not seeing the distinction. The objective of pacing is to stabilize foreground latencies by pacing (or, synonymously, smoothing) compactions to spread their resource utilization (both cpu & disk) over time. We're not attempting to alter the overall compaction throughput.
In various scenarios, while pebble compactions (including memtable -> L0) are running the latency for operations increases significantly. This is particularly noticeable on small clusters, and most noticeable on a single node cluster. In these situations each node will oscillate between "fast" (not running compaction) and "slow" (running compaction) with periods on the order of 10-60s.
Proposed solution: Attempt to smooth the compaction by detecting scenarios where there are idle periods between compactions. By smoothing the compaction, the system will always run at "medium" rate rather than oscillating between fast and slow. This has two primary benefits: 1) From the end user perspective, the latency is flatter. 2) The total throughput should increase for "open loop" workloads with fixed concurrency.
References: Internal slack discussion Presentation on latency https://github.com/cockroachdb/pebble/pull/2004
Jira issue: CRDB-43849