cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.16k stars 3.82k forks source link

roachtest: decommissionBench/nodes=6/warehouses=1000/drain-first/while-upreplicating/target=3/multi-region failed #135886

Open cockroach-teamcity opened 13 hours ago

cockroach-teamcity commented 13 hours ago

roachtest.decommissionBench/nodes=6/warehouses=1000/drain-first/while-upreplicating/target=3/multi-region failed with artifacts on master @ eb2d2e19eb29d2747d9e267bd0612a69d066adad:

(decommissionbench.go:744).runDecommissionBench: monitor failure: full command output in run_095623.969192641_n3_cockroach-node-drain.log: COMMAND_PROBLEM: exit status 1
test artifacts and logs in: /artifacts/decommissionBench/nodes=6/warehouses=1000/drain-first/while-upreplicating/target=3/multi-region/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

- #135060 roachtest: decommissionBench/nodes=6/warehouses=1000/drain-first/while-upreplicating/target=3/multi-region failed [A-testing C-bug C-test-failure O-roachtest O-robot P-3 T-kv]

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-44771

kvoli commented 6 hours ago

Failing on:

run_095623.969192641_n3_cockroach-node-drain: 2024/11/21 09:56:23 cluster.go:2480: > ./cockroach node drain --certs-dir=certs --port={pgport:3} --self
node is draining... 
ERROR: rpc error: code = Unknown desc = some sessions did not respond to cancellation within 1s
Failed running "node drain"
run_095623.969192641_n3_cockroach-node-drain: 2024/11/21 09:56:36 cluster.go:2493: > result: COMMAND_PROBLEM: exit status 1
kvoli commented 3 hours ago

This is the same failure mode as https://github.com/cockroachdb/cockroach/issues/131604, although that test expects the drain to fail, unlike here where we expect it to succeed.

I'm going to mark this as a bug and with the same priority accordingly. Note to whoever (maybe me) picks this up, to also look into the linked test failure as it may be easier/quicker to repro than this one.