Open cockroach-teamcity opened 2 days ago
Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.
roachtest.db-console/cypress failed with artifacts on master @ 39e43b85ec3b02bc760df10fce1c19d09419d6f2:
(db_console.go:137).seedCluster: dial tcp 3.144.110.101:26257: connect: connection refused
test artifacts and logs in: /artifacts/db-console/cypress/run_1
Parameters:
arch=amd64
cloud=aws
coverageBuild=false
cpu=4
encrypted=false
fs=ext4
localSSD=true
runtimeAssertionsBuild=true
ssd=0
See: roachtest README
See: How To Investigate (internal)
Grafana is not yet available for aws clusters
Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.
roachtest.db-console/cypress failed with artifacts on master @ 39e43b85ec3b02bc760df10fce1c19d09419d6f2:
(db_console.go:137).seedCluster: read tcp 172.17.0.3:34796 -> 35.196.99.252:26257: read: connection reset by peer
test artifacts and logs in: /artifacts/db-console/cypress/run_1
Parameters:
arch=amd64
cloud=gce
coverageBuild=false
cpu=4
encrypted=false
runtimeAssertionsBuild=true
ssd=0
See: roachtest README
See: How To Investigate (internal)
See: Grafana
I see this error on node 1 which seems completely unrelated:
E241114 08:20:20.651537 16780 (gostd) net/http/server.go:3416 ⋮ [-] 318 ‹http: TLS handshake error from 10.0.0.3:55430: remote error: tls: bad certificate›
E241114 08:20:28.870261 17115 (gostd) net/http/server.go:3416 ⋮ [-] 319 ‹http: TLS handshake error from 10.0.0.3:49538: remote error: tls: bad certificate›
E241114 08:20:35.652404 17365 (gostd) net/http/server.go:3416 ⋮ [-] 320 ‹http: TLS handshake error from 10.0.0.3:56574: remote error: tls: bad certificate›
E241114 08:20:43.869805 17585 (gostd) net/http/server.go:3416 ⋮ [-] 321 ‹http: TLS handshake error from 10.0.0.3:60280: remote error: tls: bad certificate›
E241114 08:20:50.653134 17801 (gostd) net/http/server.go:3416 ⋮ [-] 322 ‹http: TLS handshake error from 10.0.0.3:60292: remote error: tls: bad certificate›
E241114 08:20:58.869385 18071 (gostd) net/http/server.go:3416 ⋮ [-] 323 ‹http: TLS handshake error from 10.0.0.3:48848: remote error: tls: bad certificate›
E241114 08:21:05.652483 18287 (gostd) net/http/server.go:3416 ⋮ [-] 324 ‹http: TLS handshake error from 10.0.0.3:47858: remote error: tls: bad certificate›
W241114 08:21:09.152322 338 kv/kvserver/liveness/liveness.go:753 ⋮ [T1,Vsystem,n1,liveness-hb] 325 slow heartbeat took 3.002216064s; err=result is ambiguous: context done during DistSender.Send: ba: ‹ConditionalPut [/System/NodeLiveness/1], EndTxn(commit modified-span (node-liveness)) [/System/NodeLiveness/1], [txn: 25ae64bb], [can-forward-ts]› RPC error: grpc: ‹context deadline exceeded› [code 4/DeadlineExceeded]
W241114 08:21:09.152591 338 kv/kvserver/liveness/liveness.go:667 ⋮ [T1,Vsystem,n1,liveness-hb] 326 failed node liveness heartbeat: operation "node liveness heartbeat" timed out after 3.002s (given timeout 3s): result is ambiguous: context done during DistSender.Send: ba: ‹ConditionalPut [/System/NodeLiveness/1], EndTxn(commit modified-span (node-liveness)) [/System/NodeLiveness/1], [txn: 25ae64bb], [can-forward-ts]› RPC error: grpc: ‹context deadline exceeded› [code 4/DeadlineExceeded]
W241114 08:21:09.152591 338 kv/kvserver/liveness/liveness.go:667 â‹® [T1,Vsystem,n1,liveness-hb] 326 +
W241114 08:21:09.152591 338 kv/kvserver/liveness/liveness.go:667 â‹® [T1,Vsystem,n1,liveness-hb] 326 +An inability to maintain liveness will prevent a node from participating in a
W241114 08:21:09.152591 338 kv/kvserver/liveness/liveness.go:667 â‹® [T1,Vsystem,n1,liveness-hb] 326 +cluster. If this problem persists, it may be a sign of resource starvation or
W241114 08:21:09.152591 338 kv/kvserver/liveness/liveness.go:667 â‹® [T1,Vsystem,n1,liveness-hb] 326 +of network connectivity problems. For help troubleshooting, visit:
W241114 08:21:09.152591 338 kv/kvserver/liveness/liveness.go:667 â‹® [T1,Vsystem,n1,liveness-hb] 326 +
W241114 08:21:09.152591 338 kv/kvserver/liveness/liveness.go:667 â‹® [T1,Vsystem,n1,liveness-hb] 326 + https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
W241114 08:21:11.370231 7992 kv/kvserver/closedts/sidetransport/receiver.go:135 ⋮ [n1,remote=4] 327 closed timestamps side-transport connection dropped from node: 4 (grpc: ‹context canceled› [code 1/Canceled])
W241114 08:21:11.489699 5033 kv/kvserver/raft_transport.go:1067 ⋮ [T1,Vsystem,n1] 328 while processing outgoing Raft queue to node 4: recv msg error: grpc: ‹grpc: the client connection is closing› [code 1/Canceled]:
E241114 08:21:11.489775 3906 2@rpc/peer.go:642 ⋮ [T1,Vsystem,n1,rnode=4,raddr=‹10.142.3.18:26257›,class=system,rpc] 329 disconnected (was healthy for 1m56.104s): operation "conn heartbeat" timed out after 6.001s (given timeout 6s): grpc: ‹context deadline exceeded› [code 4/DeadlineExceeded]
The connection error happens when we try to seed the data:
read tcp 172.17.0.3:34796 -> 35.196.99.252:26257: read: connection reset by peer
(1) attached stack trace
-- stack trace:
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*dbConsoleCypressTest).seedCluster
| pkg/cmd/roachtest/tests/db_console.go:137
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*dbConsoleCypressTest).SetupTest
| pkg/cmd/roachtest/tests/db_console.go:99
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runDbConsoleCypress
| pkg/cmd/roachtest/tests/db_console.go:235
| main.(*testRunner).runTest.func2
| pkg/cmd/roachtest/test_runner.go:1305
| runtime.goexit
| src/runtime/asm_amd64.s:1695
Wraps: (2) secondary error attachment
| read tcp 172.17.0.3:34796 -> 35.196.99.252:26257: read: connection reset by peer
| (1) read tcp 172.17.0.3:34796 -> 35.196.99.252:26257
| Wraps: (2) read
| Wraps: (3) connection reset by peer
| Error types: (1) *net.OpError (2) *os.SyscallError (3) syscall.Errno
Wraps: (3) read tcp 172.17.0.3:34796 -> 35.196.99.252:26257: read: connection reset by peer
Error types: (1) *withstack.withStack (2) *secondary.withSecondaryError (3) *errutil.leafError
It looks like the issue might be that we're trying to connect to the workload node to run seed queries, @kyle-a-wong can confirm? (link below)
Note: This build has runtime assertions enabled. If the same failure was hit in a run without assertions enabled, there should be a similar failure without this message. If there isn't one, then this failure is likely due to an assertion violation or (assertion) timeout.
roachtest.db-console/cypress failed with artifacts on master @ 39e43b85ec3b02bc760df10fce1c19d09419d6f2:
Parameters:
arch=amd64
cloud=azure
coverageBuild=false
cpu=4
encrypted=false
runtimeAssertionsBuild=true
ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
Grafana is not yet available for azure clusters
/cc @cockroachdb/obs-prsThis test on roachdash | Improve this report!
Jira issue: CRDB-44371