sql: schema change repeatedly retries with gcttl error

itsbilal commented 4 months ago

On the drt-chaos test cluster running V24.2.0-ALPHA.00000000-DEV-5AFD790501E946EF306ABE2B592C5798C29C342F, a schema change for ALTER TABLE cct_tpcc.public.order_line DROP COLUMN add_column_op_2902590426 CASCADE has been running nonstop and is being repeatedly retried.

Link to the job

Looking at the logs, we see the job failing with this error. For reference, the gc ttl on this db/table is 4 hours.

job 979031533120225281: running execution encountered retriable error: failed to construct index entries during backfill: batch timestamp 1718847123.942402651,0 must be after replica GC threshold 1719379269.625591541,0
(1) forced error mark
  | ‹"retriable job error"›
  | github.com/cockroachdb/errors/withstack/*withstack.withStack::
Wraps: (2) attached stack trace
  -- stack trace:
  | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).runBackfill.func1
  |     github.com/cockroachdb/cockroach/pkg/sql/rowexec/indexbackfiller.go:319
  | github.com/cockroachdb/cockroach/pkg/sql/rowexec.(*indexBackfiller).runBackfill.Group.GoCtx.func3
  |     github.com/cockroachdb/cockroach/pkg/util/ctxgroup/ctxgroup.go:168
  | golang.org/x/sync/errgroup.(*Group).Go.func1
  |     golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:78
  | runtime.goexit
  |     src/runtime/asm_amd64.s:1695
Wraps: (3) failed to construct index entries during backfill
Wraps: (4) batch timestamp 1718847123.942402651,0 must be after replica GC threshold 1719379269.625591541,0
Error types: (1) *markers.withMark (2) *withstack.withStack (3) *errutil.withPrefix (4) *kvpb.BatchTimestampBeforeGCError

Jira issue: CRDB-39823

blathers-crl[bot] commented 4 months ago

Hi @itsbilal, please add branch-* labels to identify which branch(es) this C-bug affects.

_{:owl: Hoot! I am a Blathers, a bot for CockroachDB. My owner is dev-inf.}

fqazi commented 4 months ago

We are running into two problems, in this scenario:

1) We always clear the protected timestamp even if a retryable error is hit, see: https://github.com/cockroachdb/cockroach/blob/c5522cee53952df1558d77b9a4bd830c3cfbe821/pkg/sql/index_backfiller.go#L86-L91 2) The readAsOf timestamp does not properly take into account the current time, if a retry happens it will assume GC TTL * 0.8 time has to pass again: https://github.com/cockroachdb/cockroach/blob/c5522cee53952df1558d77b9a4bd830c3cfbe821/pkg/jobs/jobsprotectedts/jobs_protected_ts_manager.go#L129

rafiss commented 4 months ago

@Dedej-Bergin I'll assign this to you as a bugfix/improvement that would be nice to land, but it's not highly urgent.

cockroachdb / cockroach

sql: schema change repeatedly retries with gcttl error #126260