cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.14k stars 3.81k forks source link

roachtest: perturbation/metamorphic/backfill failed #133086

Closed cockroach-teamcity closed 3 weeks ago

cockroach-teamcity commented 3 weeks ago

roachtest.perturbation/metamorphic/backfill failed with artifacts on release-24.3 @ b2d2353b876af1748607e155ecdfed9d4bba29d3:

(cluster.go:2449).Run: full command output in run_175751.361115389_n31-32_cockroach-workload-r.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/perturbation/metamorphic/backfill/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

- #131713 roachtest: perturbation/metamorphic/backfill failed [A-storage A-testing C-bug C-test-failure O-roachtest O-robot P-1 T-kv branch-master]

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-43451

arulajmani commented 3 weeks ago
  |   | ```
  |   | ./cockroach workload run kv --db backfill --duration=10m0s --max-block-bytes=10000 --min-block-bytes=10000 --concurrency=100 {pgurl:1-29}
  |   | ```
  |   | <truncated> ... : [NotLeaseHolderError] lease held by different store; r82: replica (n11,s21):5 not lease holder; current lease is repl=(n9,s17):9VOTER_INCOMING seq=7 start=1729533566.726202551,0 epo=1 min-exp=1729533572.589524843,0 pro=1729533566.734834089,0) (SQLSTATE 40003)
  |   | Write sequence could be resumed by passing --write-seq=R322899 to the next run.
  |   | Error: ERROR: result is ambiguous: error=ba: Put [/Table/106/1/3895056573302891672/0], EndTxn(parallel commit) [/Table/106/1/3895056573302891672/0], [txn: 53dd6514], [can-forward-ts] RPC error: grpc: error reading from server: read tcp 10.142.0.21:50876->10.142.1.22:26257: use of closed network connection [code 14/Unavailable] [exhausted] (last error: failed to send RPC: leaseholder not found in transport; last error: [NotLeaseHolderError] lease held by different store; r82: replica (n11,s21):5 not lease holder; current lease is repl=(n9,s17):9VOTER_INCOMING seq=7 start=1729533566.726202551,0 epo=1 min-exp=1729533572.589524843,0 pro=1729533566.734834089,0) (SQLSTATE 40003)
  | Wraps: (4) COMMAND_PROBLEM

Workload doesn't tolerate ambiguous errors, which failed the test. cc @andrewbaptist -- this is the second perturbation test failure I've seen fail because of this (https://github.com/cockroachdb/cockroach/issues/133010#issuecomment-2427228505) was the first. Is this something we want to improve here?