cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
29.87k stars 3.77k forks source link

pcr: prevent resuming replication after flashback #120832

Open dt opened 5 months ago

dt commented 5 months ago

An ALTER REVERT will destroy the MVCC history back to the revert time which can cause physical replication to be incorrect.

If REVERT is run on the source for replication, we need to ensure that resumed replication resumes from the time reverted to, or previous divergence time, whichever is earlier. This means that that the source needs to send the revert time in the prior replication status call so that that can be used instead of the cutover time in the consumer's tenant record if it is earlier.

If REVERT is run on the consumer after cutover to a time earlier than the cutover time, it should haul backwards the tracked cover time to reflect the lost history.

Jira issue: CRDB-36893

blathers-crl[bot] commented 5 months ago

cc @cockroachdb/disaster-recovery

blathers-crl[bot] commented 5 months ago

cc @cockroachdb/disaster-recovery

msbutler commented 3 months ago

@dt a couple clarification questions here:

  1. ALTER REVERT refers to the ALTER VIRTUAL CLUSTER RESET DATA cmd, correct? i.e. flashback?
  2. Just so I have all my ducks in a row, here's a timeline of events we need to address. Given a replication stream from tenant A to tenant B:

Addressing a flashback on tenant A or tenant B, requires slightly different fixes described in the initial comment.