Open itsbilal opened 4 months ago
Hi @itsbilal, please add branch-* labels to identify which branch(es) this C-bug affects.
:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.
We are running into two problems, in this scenario:
1) We always clear the protected timestamp even if a retryable error is hit, see: https://github.com/cockroachdb/cockroach/blob/c5522cee53952df1558d77b9a4bd830c3cfbe821/pkg/sql/index_backfiller.go#L86-L91 2) The readAsOf timestamp does not properly take into account the current time, if a retry happens it will assume GC TTL * 0.8 time has to pass again: https://github.com/cockroachdb/cockroach/blob/c5522cee53952df1558d77b9a4bd830c3cfbe821/pkg/jobs/jobsprotectedts/jobs_protected_ts_manager.go#L129
@Dedej-Bergin I'll assign this to you as a bugfix/improvement that would be nice to land, but it's not highly urgent.
On the drt-chaos test cluster running
V24.2.0-ALPHA.00000000-DEV-5AFD790501E946EF306ABE2B592C5798C29C342F
, a schema change forALTER TABLE cct_tpcc.public.order_line DROP COLUMN add_column_op_2902590426 CASCADE
has been running nonstop and is being repeatedly retried.Link to the job
Looking at the logs, we see the job failing with this error. For reference, the gc ttl on this db/table is 4 hours.
Jira issue: CRDB-39823