cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.12k stars 3.81k forks source link

spanconfigkvsubscriber: replica of a GC'ed span does not respect SystemSpanConfigs #113867

Open adityamaru opened 1 year ago

adityamaru commented 1 year ago

In our test cluster we observed two ranges corresponding to dropped tables not respecting protection policies enforced by SystemSpanConfigs. The replicas of these empty ranges were still evaluating batch requests and causing them to fail with ‹ERROR: batch timestamp 1698960957.937065386,0 must be after replica GC threshold 1698967771.741796894,0 (SQLSTATE XXUUU)›. The running theory is that with the following sequence of operations:

We end up in a situation where the spanconfigstore does not find any overlapping span configs for the empty range and so does not run the logic to check if any system span configs apply to that range - https://github.com/cockroachdb/cockroach/blob/master/pkg/spanconfig/spanconfigstore/store.go#L199. In this way it misses any protection policies that should hold up the GCThreshold and allows GC to move past the protected timestamp. We think it makes sense to apply a default zone config with the system span configs combined into it to such ranges. We are still attempting to reproduce this locally.

Jira issue: CRDB-33239

blathers-crl[bot] commented 1 year ago

Hi @adityamaru, please add branch-* labels to identify which branch(es) this release-blocker affects.

:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.

adityamaru commented 1 year ago

I've slapped on the GA-blocker label, but technically this is not a regression from 22.2 when this infrastructure was first introduced so we can re-evaluate.

adityamaru commented 11 months ago

I'll keep this open until the 23.1 backport merges