cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

roachtest: perturbation/metamorphic/backfill failed #133374

Open cockroach-teamcity opened 2 days ago

cockroach-teamcity commented 2 days ago

Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.

roachtest.perturbation/metamorphic/backfill failed with artifacts on release-24.3 @ 36f5b311f33775bbeb26d56003f6831ec9ddd837:

(cluster.go:2449).Run: full command output in run_172538.337172712_n31-32_cockroach-workload-r.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/perturbation/metamorphic/backfill/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

- #133155 roachtest: perturbation/metamorphic/backfill failed [B-runtime-assertions-enabled C-bug C-test-failure O-roachtest O-robot T-kv branch-master]

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-43561

miraradeva commented 2 days ago

The test failed due to a serialization error:

Error: ERROR: result is ambiguous: error=ba: Put [/Table/106/1/7889463034662031696/0], EndTxn(parallel commit) [/Table/106/1/7889463034662031696/0], [txn: 9266c39a], [can-forward-ts] RPC error: grpc: error reading from server: read tcp 10.142.1.76:53622->10.142.1.91:26257: use of closed network connection [code 14/Unavailable] [propagate] (last error: TransactionRetryError: retry txn (RETRY_SERIALIZABLE): "sql txn" meta={id=9266c39a key=/Table/106/1/7889463034662031696/0 iso=Serializable pri=0.02816496 epo=0 ts=1729790905.319193092,2 min=1729790902.078664092,0 seq=2} lock=true stat=PENDING rts=1729790902.078664092,0 wto=false gul=1729790902.578664092,0) (SQLSTATE 40003)

This failed right after running:

2024/10/24 17:25:05 admission_control_latency.go:341: waiting for replicas to be in place
2024/10/24 17:25:38 cluster.go:2469: running cmd `./cockroach workload run kv...` on nodes [:31-32]

That workload command is:

"./cockroach workload run kv --db backfill --duration=%s --max-block-bytes=%d --min-block-bytes=%d --concurrency=100 {pgurl%s}"

@andrewbaptist shouldn't this include --tolerate-errors?

miraradeva commented 1 day ago

Assigning C-bug label to move from triage queue.