cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.06k stars 3.8k forks source link

roachtest: failover/partial/lease-gateway/lease=leader failed #133435

Open cockroach-teamcity opened 17 hours ago

cockroach-teamcity commented 17 hours ago

roachtest.failover/partial/lease-gateway/lease=leader failed with artifacts on master @ 82b1fda15c4616713b278c447d24b0ab5416e511:

(test_runner.go:1316).runTest: test timed out (45m0s)
test artifacts and logs in: /artifacts/failover/partial/lease-gateway/lease=leader/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-43607

miraradeva commented 14 hours ago

This is a duplicate of https://github.com/cockroachdb/cockroach/issues/130647 and https://github.com/cockroachdb/cockroach/issues/133064.

The test fails at setup:

2024/10/25 10:26:34 failover.go:1643: waiting for 766 ranges to upreplicate (database_name = 'kv')
2024/10/25 10:26:40 failover.go:1643: waiting for 441 ranges to upreplicate (database_name = 'kv')
...
2024/10/25 11:10:40 failover.go:1643: waiting for 77 ranges to upreplicate (database_name = 'kv')
2024/10/25 11:10:44 failover.go:1643: waiting for 77 ranges to upreplicate (database_name = 'kv')

And we see grpc connection errors like this between 10:26 and 11:10:

E241025 10:26:27.696635 6420 2@rpc/peer.go:663 ⋮ [T1,Vsystem,n1,rnode=5,raddr=‹10.142.1.231:26257›,class=default,rpc] 939  failed connection attempt (last connected 4.001s ago): grpc: ‹connection error: desc = "transport: authentication handshake failed: context deadline exceeded"› [code 14/Unavailable]

It seems like an infra flake but because we've seen it a few times now, I'll leave it open and assign a P3. https://github.com/cockroachdb/cockroach/issues/133064 is already closed. I'll close https://github.com/cockroachdb/cockroach/issues/130647 as a duplicate of this issue.