Closed nvanbenschoten closed 1 week ago
I had created a prototype queue to address this across all ranges here: #112731. But we decided it wasn't something we wanted to keep long term since this should be handled by storage, so I abandoned it. This may be better in the MVCC GC queue rather than a new queue. That said, you might be able to use some of the parts of the shouldQueue
and process
code to implement this.
Note the "health" of the LSM will be different on each node, so you either need to use a time based compaction (as suggested in the issue), run a "distributed" tombstone check, or run the queue against all replicas. The reason I had originally added the new queue with needsLease=false
was to do the third option. Note the MVCC GC queue has needsLease=true
.
Based on the specified backports for linked PR #129827, I applied the following new label(s) to this issue: branch-release-23.2, branch-release-24.1, branch-release-24.2. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
Based on the specified backports for linked PR #129827, I applied the following new label(s) to this issue: branch-release-23.2.12-rc, branch-release-24.1.5-rc, branch-release-24.2.3-rc. Please adjust the labels as needed to match the branches actually affected by this issue, including adding any known older branches.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
An accumulation of point tombstones in the LSM across the node liveness range can have destabilizing effects on liveness availability, which has a destabilizing effect on cluster availability. This has come up in the past (#2107) and recently came up again (#3053). Thus far, we have been unable to eliminate the scan (#97966), so we rely on compaction keeping the keyspace sufficiently well compacted such that a scan over the O(100) keys does not take more than 1-2s. We are also still working on (#918) improvements to Pebble to prevent such an accumulation, so we remain vulnerable to this issue.
As a backportable stop gap, we should manually compact the liveness range periodically. The most appropriate place to do this is the MVCC GC queue. We could use the
QueueLastProcessed
timestamp to place an upper bound on the manual compaction rate (suggestion: every hour). We would then fan out aCompactEngineSpanRequest
to each of the replicas in the liveness range. This should keep the liveness range from accumulating point tombstones for long periods of time.Jira issue: CRDB-41308
Epic CRDB-37617