Open aliher1911 opened 1 year ago
The preferred solution here is to account for the key contribution for each individual version, rather than once per user key: https://github.com/cockroachdb/cockroach/issues/92254.
For reference, latchless GC was introduced in https://github.com/cockroachdb/cockroach/pull/83213.
Describe the problem
Latchless garbage collection introduced in 22.2 removes interference between GC and readers and writers by ensuring that readers never read below GC threshold and writers always write above it.
Unfortunately we have another point of contention - MVCCStats. And currently changes to the stats are not commutative. We only count the size of a key once for the whole history (because we want to reflect prefix compression) and count the size of timestamp for every key. To maintain this invariant, when key is added for the first time, its size is added to MVCCStats and when last version is removed by GC (MVCC tombstone was placed and age of tombstone exceeds gc.ttl) then key size is subtracted from MVCCStats.
Since we removed latches from GC requests, they could now be evaluated in parallel to writes for the same key. GC request would iterate key versions and determines that it removed last version of the key. Put request on the other hand still sees previous tombstone and decides that it is not a first key in history. GC request would then decrease MVCCStats values, but Put request would not increment them.
To Reproduce
It is not easy to reproduce in the live system, but test might be possible using TestingKnobs that would pause request executions it correct points. Issue seems like a classic race condition.
Expected behavior A clear and concise description of what you expected to happen.
Environment:
Additional context The effect of incorrect stats is not significant. Since it is quite an unlikely event, difference would not skew things like GC or range split decisions significantly. Consistency checker will flag it, but won't cause nodes to panic.
Jira issue: CRDB-21586