cockroachdb / cockroach

CockroachDB - the open source, cloud-native distributed SQL database.
https://www.cockroachlabs.com
Other
29.56k stars 3.71k forks source link

roachtest: backup-restore/mixed-version failed #125677

Open cockroach-teamcity opened 1 month ago

cockroach-teamcity commented 1 month ago

roachtest.backup-restore/mixed-version failed with artifacts on release-23.1 @ 2a97ebe6c923cb24cceadd047fb11b289659e25a:

(mixedversion.go:596).Run: unexpected node event: n2: cockroach process for system interface died (exit code 15)
test artifacts and logs in: /artifacts/backup-restore/mixed-version/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

- #125546 roachtest: backup-restore/mixed-version failed [C-test-failure O-roachtest O-robot P-3 T-disaster-recovery branch-release-23.2.6-rc]

/cc @cockroachdb/disaster-recovery

This test on roachdash | Improve this report!

Jira issue: CRDB-39551

msbutler commented 1 month ago

This is a new one: we seemed to have node liveness issues on node 4. This occurred after a restore completed, during fingerprinting. The cluster was on 23.1.

W240614 10:17:00.062122 480 kv/kvserver/closedts/sidetransport/receiver.go:139 ⋮ [n4] 1015  closed timestamps side-transport connection dropped from node: 2
W240614 10:17:00.990128 243 kv/kvserver/liveness/liveness.go:906 ⋮ [n4,liveness-hb] 1016  slow heartbeat took 4.501632212s; err=result is ambiguous: context done during DistSender.Send: ba: ‹ConditionalPut [/System/NodeLiveness/4,/Min), EndTxn(commit modified-span (node-liveness)) [/System/NodeLiveness/4], [txn: 9bf6cf26], [can-forward-ts]› RPC error: ‹rpc error: code = DeadlineExceeded desc = context deadline exceeded›
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017  failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.502s (given timeout 4.5s)›: result is ambiguous: context done during DistSender.Send: ba: ‹ConditionalPut [/System/NodeLiveness/4,/Min), EndTxn(commit modified-span (node-liveness)) [/System/NodeLiveness/4], [txn: 9bf6cf26], [can-forward-ts]› RPC error: ‹rpc error: code = DeadlineExceeded desc = context deadline exceeded›
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +(1) ‹operation "node liveness heartbeat" timed out after 4.502s (given timeout 4.5s)›
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +Wraps: (2) result is ambiguous: context done during DistSender.Send: ba: ‹ConditionalPut [/System/NodeLiveness/4,/Min), EndTxn(commit modified-span (node-liveness)) [/System/NodeLiveness/4], [txn: 9bf6cf26], [can-forward-ts]› RPC error: ‹rpc error: code = DeadlineExceeded desc = context deadline exceeded›
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +Wraps: (3)
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | (opaque error wrapper)
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | type name: github.com/cockroachdb/errors/withstack/*withstack.withStack
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | reportable 0:
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendToReplicas
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:2310
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendPartialBatch
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1623
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).divideAndSendBatchToRanges
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1221
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).Send
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:845
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnLockGatekeeper).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_lock_gatekeeper.go:82
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnMetricRecorder).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_metric_recorder.go:46
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnCommitter).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_committer.go:202
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).sendLockedWithRefreshAttempts
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:225
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:153
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnPipeliner).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go:290
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSeqNumAllocator).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_seq_num_allocator.go:104
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnHeartbeater).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_heartbeater.go:245
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*TxnCoordSender).Send
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:526
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).sendUsingSender
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/db.go:1045
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Send
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/txn.go:1093
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.sendAndFill
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/db.go:884
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Run
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/txn.go:673
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).updateLivenessAttempt.func1
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:1400
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.runTxn.func1
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/db.go:1009
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/txn.go:961
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.runTxn
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/db.go:1008
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).TxnWithAdmissionControl
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/db.go:971
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/db.go:950
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).updateLivenessAttempt
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:1373
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).updateLiveness
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:1328
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).heartbeatInternal
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:981
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).Start.func1.1
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:796
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:91
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).Start.func1
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:779
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:489
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | runtime.goexit
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   GOROOT/src/runtime/asm_amd64.s:1594
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +Wraps: (4) context done during DistSender.Send
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +Wraps: (5)
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | (opaque error wrapper)
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | type name: github.com/cockroachdb/errors/withstack/*withstack.withStack
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | reportable 0:
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*grpcTransport).sendBatch
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport.go:231
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*grpcTransport).SendNext
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/transport.go:190
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendToReplicas
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:2079
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).sendPartialBatch
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1623
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).divideAndSendBatchToRanges
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:1221
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*DistSender).Send
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/dist_sender.go:845
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnLockGatekeeper).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_lock_gatekeeper.go:82
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnMetricRecorder).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_metric_recorder.go:46
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnCommitter).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_committer.go:202
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).sendLockedWithRefreshAttempts
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:225
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSpanRefresher).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_span_refresher.go:153
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnPipeliner).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_pipeliner.go:290
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnSeqNumAllocator).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_seq_num_allocator.go:104
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*txnHeartbeater).SendLocked
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_interceptor_heartbeater.go:245
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord.(*TxnCoordSender).Send
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvclient/kvcoord/txn_coord_sender.go:526
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).sendUsingSender
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/db.go:1045
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Send
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/txn.go:1093
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.sendAndFill
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/db.go:884
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).Run
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/txn.go:673
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).updateLivenessAttempt.func1
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:1400
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.runTxn.func1
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/db.go:1009
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.(*Txn).exec
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/txn.go:961
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.runTxn
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/db.go:1008
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).TxnWithAdmissionControl
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/db.go:971
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv.(*DB).Txn
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/db.go:950
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).updateLivenessAttempt
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:1373
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).updateLiveness
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:1328
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).heartbeatInternal
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:981
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).Start.func1.1
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:796
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/util/contextutil.RunWithTimeout
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/util/contextutil/context.go:91
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness.(*NodeLiveness).Start.func1
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/kv/kvserver/liveness/liveness.go:779
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  | github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +  |   github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:489
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +Wraps: (6) ba: ‹ConditionalPut [/System/NodeLiveness/4,/Min), EndTxn(commit modified-span (node-liveness)) [/System/NodeLiveness/4], [txn: 9bf6cf26], [can-forward-ts]› RPC error
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +Wraps: (7) ‹rpc error: code = DeadlineExceeded desc = context deadline exceeded›
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +Error types: (1) *contextutil.TimeoutError (2) *roachpb.AmbiguousResultError (3) *errbase.opaqueWrapper (4) *errutil.withPrefix (5) *errbase.opaqueWrapper (6) *errutil.withPrefix (7) *status.Error
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +An inability to maintain liveness will prevent a node from participating in a
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +cluster. If this problem persists, it may be a sign of resource starvation or
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +of network connectivity problems. For help troubleshooting, visit:
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +
W240614 10:17:00.990324 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1017 +    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
E240614 10:17:05.391990 79921 kv/kvserver/replica_range_lease.go:522 ⋮ [n4,s4,r706/1:‹/Table/246/1/6{0/8/-…-3/6/-…}›] 1018  failed to increment leaseholder's epoch: result is ambiguous: context done during DistSender.Send: ba: ‹ConditionalPut [/System/NodeLiveness/3,/Min), EndTxn(commit modified-span (node-liveness)) [/System/NodeLiveness/3], [txn: 65dc4256], [can-forward-ts]› RPC error: ‹rpc error: code = Canceled desc = context canceled›
W240614 10:17:05.491130 243 kv/kvserver/liveness/liveness.go:906 ⋮ [n4,liveness-hb] 1019  slow heartbeat took 4.50038083s; err=context deadline exceeded
W240614 10:17:05.491223 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1020  failed node liveness heartbeat: ‹operation "node liveness heartbeat" timed out after 4.5s (given timeout 4.5s)›: context deadline exceeded
W240614 10:17:05.491223 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1020 +(1) ‹operation "node liveness heartbeat" timed out after 4.5s (given timeout 4.5s)›
W240614 10:17:05.491223 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1020 +Wraps: (2) context deadline exceeded
W240614 10:17:05.491223 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1020 +Error types: (1) *contextutil.TimeoutError (2) context.deadlineExceededError
W240614 10:17:05.491223 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1020 +
W240614 10:17:05.491223 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1020 +An inability to maintain liveness will prevent a node from participating in a
W240614 10:17:05.491223 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1020 +cluster. If this problem persists, it may be a sign of resource starvation or
W240614 10:17:05.491223 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1020 +of network connectivity problems. For help troubleshooting, visit:
W240614 10:17:05.491223 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1020 +
W240614 10:17:05.491223 243 kv/kvserver/liveness/liveness.go:808 ⋮ [n4,liveness-hb] 1020 +    https://www.cockroachlabs.com/docs/stable/cluster-setup-troubleshooting.html#node-liveness-issues
W
renatolabs commented 1 month ago

Drive-by: note that the test failed because n2 OOM'ed.

msbutler commented 1 month ago

Oof, sorry for the noise kv. this OOM occured in the http client. we've seen this before i think.

image
msbutler commented 1 month ago

Looks similar to https://github.com/cockroachdb/cockroach/issues/103481#issuecomment-1554526442

msbutler commented 1 month ago

Maybe i'm misreading the code, but did we forget to impliement close for a slice iterator?? The thing returned here https://github.com/cockroachdb/cockroach/blob/release-23.1/pkg/ccl/backupccl/backupinfo/manifest_handling.go#L1620

This would lead to a memory leak in genSpans.

cockroach-teamcity commented 3 weeks ago

roachtest.backup-restore/mixed-version failed with artifacts on release-23.1 @ 7badd78e3f8d896480567f37daa547f9e56639f7:

(mixedversion.go:614).Run: mixed-version test failure while running step 57 (run "verify some backups"): mixed-version: error waiting for job to finish: job 979121044995211266 failed with error: restoring 9 TableDescriptors from 1 databases: restoring table desc and namespace entries: table already exists
test artifacts and logs in: /artifacts/backup-restore/mixed-version/run_1

Parameters:

See: roachtest README

See: How To Investigate (internal)

See: Grafana

Same failure on other branches

- #125921 roachtest: backup-restore/mixed-version failed [C-test-failure O-roachtest O-robot T-disaster-recovery branch-release-24.1 release-blocker] - #125916 roachtest: backup-restore/mixed-version failed [C-test-failure O-roachtest O-robot T-disaster-recovery branch-release-23.1.23-rc release-blocker]

This test on roachdash | Improve this report!