cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.18k stars 3.82k forks source link

roachtest: failover/partial/lease-gateway/lease=leader failed #133435

Open cockroach-teamcity opened 1 month ago

cockroach-teamcity commented 1 month ago

roachtest.failover/partial/lease-gateway/lease=leader failed with artifacts on master @ 82b1fda15c4616713b278c447d24b0ab5416e511:

(test_runner.go:1316).runTest: test timed out (45m0s)
test artifacts and logs in: /artifacts/failover/partial/lease-gateway/lease=leader/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-43607

miraradeva commented 1 month ago

This is a duplicate of https://github.com/cockroachdb/cockroach/issues/130647 and https://github.com/cockroachdb/cockroach/issues/133064.

The test fails at setup:

2024/10/25 10:26:34 failover.go:1643: waiting for 766 ranges to upreplicate (database_name = 'kv')
2024/10/25 10:26:40 failover.go:1643: waiting for 441 ranges to upreplicate (database_name = 'kv')
...
2024/10/25 11:10:40 failover.go:1643: waiting for 77 ranges to upreplicate (database_name = 'kv')
2024/10/25 11:10:44 failover.go:1643: waiting for 77 ranges to upreplicate (database_name = 'kv')

And we see grpc connection errors like this between 10:26 and 11:10:

E241025 10:26:27.696635 6420 2@rpc/peer.go:663 ⋮ [T1,Vsystem,n1,rnode=5,raddr=‹10.142.1.231:26257›,class=default,rpc] 939  failed connection attempt (last connected 4.001s ago): grpc: ‹connection error: desc = "transport: authentication handshake failed: context deadline exceeded"› [code 14/Unavailable]

It seems like an infra flake but because we've seen it a few times now, I'll leave it open and assign a P3. https://github.com/cockroachdb/cockroach/issues/133064 is already closed. I'll close https://github.com/cockroachdb/cockroach/issues/130647 as a duplicate of this issue.

cockroach-teamcity commented 1 week ago

roachtest.failover/partial/lease-gateway/lease=leader failed with artifacts on master @ e83bc46aa42f2476b4b11b9703b8038c660dc980:

(monitor.go:149).Wait: monitor failure: full command output in run_084323.451860111_n8_cockroach-workload-r.log: COMMAND_PROBLEM: exit status 1
(cluster.go:2456).Run: context canceled
test artifacts and logs in: /artifacts/failover/partial/lease-gateway/lease=leader/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 week ago

roachtest.failover/partial/lease-gateway/lease=leader failed with artifacts on master @ e83bc46aa42f2476b4b11b9703b8038c660dc980:

(monitor.go:149).Wait: monitor failure: full command output in run_094900.830399086_n8_cockroach-workload-r.log: COMMAND_PROBLEM: exit status 1
(cluster.go:2456).Run: context canceled
test artifacts and logs in: /artifacts/failover/partial/lease-gateway/lease=leader/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 week ago

roachtest.failover/partial/lease-gateway/lease=leader failed with artifacts on master @ a5ab06046e817627cd0b64390f4ede0c4a5d82ef:

(monitor.go:149).Wait: monitor failure: full command output in run_075801.026605132_n8_cockroach-workload-r.log: COMMAND_PROBLEM: exit status 1
(cluster.go:2456).Run: context canceled
test artifacts and logs in: /artifacts/failover/partial/lease-gateway/lease=leader/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

Grafana is not yet available for azure clusters

This test on roachdash | Improve this report!

cockroach-teamcity commented 1 week ago

roachtest.failover/partial/lease-gateway/lease=leader failed with artifacts on master @ a5ab06046e817627cd0b64390f4ede0c4a5d82ef:

(monitor.go:149).Wait: monitor failure: full command output in run_092546.041700876_n8_cockroach-workload-r.log: COMMAND_PROBLEM: exit status 1
(cluster.go:2456).Run: context canceled
test artifacts and logs in: /artifacts/failover/partial/lease-gateway/lease=leader/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

This test on roachdash | Improve this report!