Please describe the issue you observed, and any steps we can take to reproduce it:
Consider the following scenario with primary cluster A, with two standbys B and C:
Start PCR from cluster A to two clusters B and C
Complete cutover on clusters B and C
Start PCR from B back to A using fast cutback, and complete cutover
At this point, what happens if we stop the service on A again and then attempt to start PCR from C back to A using fast cutback? Cluster A gets rewound to the point where cluster C ran cutover and then consumes changes from there.
However, after chatting with @dt, he thinks it's likely that this rewind may not quite be done correctly. David detailed the following sequence events:
If B cutover at time T5 before C cutover at time T8, when we made A a standby of B, we rewound it to the time B diverged -- T5. This destroys all history in A newer than T5.
Then we get new history from B, at times greater than T5. So let's say B sends us a row at T7.
Then we promote A at T10 -- so it still has B's T7 row. Then we stand it back down and tell it to be a secondary to C, it'll rewind to do that, but I wonder what time we rewind to
It needs to be T5 since that's the last time A has that is in common with C, but I am questioning if we thought of this or if the code is gonna be silly and pick T8, when C diverged
Describe the problem
Please describe the issue you observed, and any steps we can take to reproduce it:
Consider the following scenario with primary cluster A, with two standbys B and C:
At this point, what happens if we stop the service on A again and then attempt to start PCR from C back to A using fast cutback? Cluster A gets rewound to the point where cluster C ran cutover and then consumes changes from there.
However, after chatting with @dt, he thinks it's likely that this rewind may not quite be done correctly. David detailed the following sequence events:
Some testing/digging is the next step here.
To Reproduce
Steps to set up the scenario are detailed above.
Environment:
cockroach sql
, JDBC, ...]: n/aAdditional context n/a
Jira issue: CRDB-42759