Closed cockroach-teamcity closed 3 weeks ago
n2 is nowhere to be found in the logs. Did it get preempted or something?
cc @cockroachdb/test-eng
Did it get preempted or something?
This is on azure which doesn't support spot VMs yet.
Something happened to n2
,
I241105 06:52:11.107020 30981 rpc/heartbeat.go:174 â‹® [-] 734 failing ping request from node n2
E241105 06:52:11.107518 25897 kv/kvserver/replica_consistency.go:764 ⋮ [T1,Vsystem,n1,s1,r99/4:‹/Tenant/3/Table/11{2/1…-4/1…}›] 735 checksum computation failed: context canceled
I241105 06:52:11.151482 30977 rpc/heartbeat.go:174 â‹® [-] 736 failing ping request from node n2
W241105 06:52:11.171362 2764 kv/kvserver/closedts/sidetransport/sender.go:838 ⋮ [T1,Vsystem,n1,ctstream=2] 737 failed to send closed timestamp message 601 to n2: send msg error: ‹EOF›
I241105 06:52:11.803854 31033 rpc/heartbeat.go:174 â‹® [-] 738 failing ping request from node n2
I241105 06:52:13.480847 29676 sql/stats/automatic_stats.go:865 ⋮ [T1,Vsystem,n1] 739 automatically executing ‹"CREATE STATISTICS __auto__ FROM [54] WITH OPTIONS THROTTLING 0.9 AS OF SYSTEM TIME '-30s'"›
E241105 06:52:15.096939 1828 2@rpc/peer.go:668 ⋮ [T1,Vsystem,n1,rnode=2,raddr=‹10.1.0.156:26257›,class=system,rpc] 740 failed connection attempt‹ (last connected 4.001s ago)›: grpc: ‹connection error: desc = "transport: error while dialing: dial tcp 10.1.0.156:26257: i/o timeout"› [code 14/Unavailable]
On a first glance, there doesn't appear to be anything wrong with the test infra. Handing over to DR for further triage.
there doesn't appear to be anything wrong with the test infra
@srosenberg Where are node 2's logs?
there doesn't appear to be anything wrong with the test infra
@srosenberg Where are node 2's logs?
Since n2
became (and stayed) unreachable during the test, its logs could not be downloaded.
Is that the cockroach process or the vm that we're saying was unreachable?
Is that the cockroach process or the vm that we're saying was unreachable?
Definitely the vm was unreachable. I did scan other available logs, but nothing really stood out. All things considering, this is likely a transient issue in azure. Feel free to close it, assuming nothing else sticks out wrt what's being tested.
roachtest.c2c/mixed-version failed with artifacts on master @ 015b2f48cf80a6d8b60d7038c8c3457d934c716a:
Parameters:
ROACHTEST_arch=arm64
ROACHTEST_cloud=azure
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=4
ROACHTEST_encrypted=false
ROACHTEST_fs=ext4
ROACHTEST_localSSD=true
ROACHTEST_runtimeAssertionsBuild=false
ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
Grafana is not yet available for azure clusters
/cc @cockroachdb/disaster-recoveryThis test on roachdash | Improve this report!
Jira issue: CRDB-43993