Open irfansharif opened 1 year ago
Hi @irfansharif, please add a C-ategory label to your issue. Check out the label system docs.
While you're here, please consider adding an A- label to help keep our repository tidy.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
In general, it seems like, on some level, installing a blanked protected timestamp is the wrong answer. A backfill can go on for a long time, and it does checkpoint regularly. So long as the checkpoint interval is longer than the GC TTL, the backfill should make progress. Perhaps the right compromise is to install a protected timestamp that we hoist on each checkpoint to now minus one or two checkpoint intervals. That's relatively complex.
In practice, I don't expect most GC TTLs to be shorter than a checkpoint interval (60s), but I could be wrong.
The band-aid I'd propose here is to clear the backoff on the index backfill after some number of minutes of running.
Describe the problem
We see the following when trying to create an index on the stock TPC-E dataset which uses a GC TTL of 300s. It keeps retrying but never succeeding. It retries with the same batch timestamp despite the replica GC threshold being raised higher and higher.
To Reproduce
Run the roachtest from https://github.com/cockroachdb/cockroach/pull/89324.
Expected behavior
Additional data / screenshots
Some internal discussion here.
Jira issue: CRDB-25185