cdc: improve fine-grain checkpointing

cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.

https://www.cockroachlabs.com

Other

29.96k stars 3.79k forks source link

cdc: improve fine-grain checkpointing #129663

Open rharding6373 opened 1 month ago

rharding6373 commented 1 month ago

CDC's current algorithm to create a span-based checkpoint (which is reconstituted as a starting point for fine-grain checkpointing during replanning) works well if there is a single lagging span, but is naiive in situations where there is a spread between spans' latest resolved timestamp or multiple lagging ranges. We could significantly reduce duplicate emissions and improve changefeed performance by even marginally improving on the fine-grain checkpointing algorithm.

Jira issue: CRDB-41651

blathers-crl[bot] commented 1 week ago

cc @cockroachdb/cdc

andyyang890 commented 23 hours ago

Eng brief with proposal: https://docs.google.com/document/d/18vz39Z95jiJDFmMaXfK8YeLEuz2_NNsHg4AekSvkXmI/edit#heading=h.1565xzh8a0n