cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.87k stars 3.77k forks source link

kv: manually compact liveness range periodically #128968

Closed nvanbenschoten closed 1 week ago

nvanbenschoten commented 1 month ago

An accumulation of point tombstones in the LSM across the node liveness range can have destabilizing effects on liveness availability, which has a destabilizing effect on cluster availability. This has come up in the past (#2107) and recently came up again (#3053). Thus far, we have been unable to eliminate the scan (#97966), so we rely on compaction keeping the keyspace sufficiently well compacted such that a scan over the O(100) keys does not take more than 1-2s. We are also still working on (#918) improvements to Pebble to prevent such an accumulation, so we remain vulnerable to this issue.

As a backportable stop gap, we should manually compact the liveness range periodically. The most appropriate place to do this is the MVCC GC queue. We could use the QueueLastProcessed timestamp to place an upper bound on the manual compaction rate (suggestion: every hour). We would then fan out a CompactEngineSpanRequest to each of the replicas in the liveness range. This should keep the liveness range from accumulating point tombstones for long periods of time.

Jira issue: CRDB-41308

Epic CRDB-37617

andrewbaptist commented 2 weeks ago

I had created a prototype queue to address this across all ranges here: #112731. But we decided it wasn't something we wanted to keep long term since this should be handled by storage, so I abandoned it. This may be better in the MVCC GC queue rather than a new queue. That said, you might be able to use some of the parts of the shouldQueue and process code to implement this.

Note the "health" of the LSM will be different on each node, so you either need to use a time based compaction (as suggested in the issue), run a "distributed" tombstone check, or run the queue against all replicas. The reason I had originally added the new queue with needsLease=false was to do the third option. Note the MVCC GC queue has needsLease=true.

blathers-crl[bot] commented 1 day ago

Based on the specified backports for linked PR #129827, I applied the following new label(s) to this issue: branch-release-23.2, branch-release-24.1, branch-release-24.2. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

blathers-crl[bot] commented 17 hours ago

Based on the specified backports for linked PR #129827, I applied the following new label(s) to this issue: branch-release-23.2.12-rc, branch-release-24.1.5-rc, branch-release-24.2.3-rc. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.