Closed cockroach-teamcity closed 1 year ago
roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 5ad21e3896ee809e9c3ebc28bb22166f1275acca:
| 882.0s 0 0.0 9.0 0.0 0.0 0.0 0.0 stockLevel
| 883.0s 0 2.0 9.1 36507.2 38654.7 38654.7 38654.7 delivery
| 883.0s 0 34.0 92.3 66572.0 103079.2 103079.2 103079.2 newOrder
| 883.0s 0 3.0 9.1 38654.7 103079.2 103079.2 103079.2 orderStatus
| 883.0s 0 32.0 90.6 42949.7 103079.2 103079.2 103079.2 payment
| 883.0s 0 0.0 9.0 0.0 0.0 0.0 0.0 stockLevel
| 884.0s 0 5.0 9.1 103079.2 103079.2 103079.2 103079.2 delivery
| 884.0s 0 38.0 92.2 81604.4 103079.2 103079.2 103079.2 newOrder
| 884.0s 0 5.0 9.1 36507.2 103079.2 103079.2 103079.2 orderStatus
| 884.0s 0 49.0 90.6 42949.7 103079.2 103079.2 103079.2 payment
| 884.0s 0 5.0 9.0 103079.2 103079.2 103079.2 103079.2 stockLevel
Wraps: (8) COMMAND_PROBLEM
Wraps: (9) Node 5. Command with error:
| ``````
| ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333 {pgurl:1-4}
| ``````
Wraps: (10) exit status 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError
mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
| main.(*testRunner).runTest.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
Wraps: (4) monitor task failed
Wraps: (5) attached stack trace
-- stack trace:
| main.init
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
| runtime.doInit
| /usr/local/go/src/runtime/proc.go:6498
| runtime.main
| /usr/local/go/src/runtime/proc.go:238
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1581
Wraps: (6) t.Fatal() was called
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 4b41789120e019ab015e6dbb924df763897ebadb:
| 960.0s 0 3.0 10.9 90194.3 103079.2 103079.2 103079.2 delivery
| 960.0s 0 34.0 110.2 73014.4 103079.2 103079.2 103079.2 newOrder
| 960.0s 0 2.0 10.9 45097.2 45097.2 45097.2 45097.2 orderStatus
| 960.0s 0 35.0 108.0 73014.4 103079.2 103079.2 103079.2 payment
| 960.0s 0 3.0 10.9 4831.8 90194.3 90194.3 90194.3 stockLevel
| _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
| 961.0s 0 1.0 10.9 103079.2 103079.2 103079.2 103079.2 delivery
| 961.0s 0 37.0 110.1 90194.3 103079.2 103079.2 103079.2 newOrder
| 961.0s 0 5.0 10.9 25769.8 103079.2 103079.2 103079.2 orderStatus
| 961.0s 0 40.0 107.9 81604.4 103079.2 103079.2 103079.2 payment
| 961.0s 0 1.0 10.9 40802.2 40802.2 40802.2 40802.2 stockLevel
Wraps: (8) COMMAND_PROBLEM
Wraps: (9) Node 5. Command with error:
| ``````
| ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333 {pgurl:1-4}
| ``````
Wraps: (10) exit status 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError
mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
| main.(*testRunner).runTest.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
Wraps: (4) monitor task failed
Wraps: (5) attached stack trace
-- stack trace:
| main.init
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
| runtime.doInit
| /usr/local/go/src/runtime/proc.go:6498
| runtime.main
| /usr/local/go/src/runtime/proc.go:238
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1581
Wraps: (6) t.Fatal() was called
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 912964e02ddd951c77d4f71981ae18b3894e9084:
| 1253.0s 0 4.0 8.8 98784.2 103079.2 103079.2 103079.2 stockLevel
| 1254.0s 0 7.0 8.7 47244.6 103079.2 103079.2 103079.2 delivery
| 1254.0s 0 36.0 89.5 77309.4 103079.2 103079.2 103079.2 newOrder
| 1254.0s 0 3.0 8.9 2684.4 64424.5 64424.5 64424.5 orderStatus
| 1254.0s 0 31.0 88.6 28991.0 103079.2 103079.2 103079.2 payment
| 1254.0s 0 4.0 8.8 1140.9 103079.2 103079.2 103079.2 stockLevel
| 1255.0s 0 5.0 8.7 103079.2 103079.2 103079.2 103079.2 delivery
| 1255.0s 0 50.9 89.5 47244.6 103079.2 103079.2 103079.2 newOrder
| 1255.0s 0 4.0 8.9 77309.4 103079.2 103079.2 103079.2 orderStatus
| 1255.0s 0 40.0 88.6 49392.1 103079.2 103079.2 103079.2 payment
| 1255.0s 0 3.0 8.8 45097.2 103079.2 103079.2 103079.2 stockLevel
Wraps: (8) COMMAND_PROBLEM
Wraps: (9) Node 5. Command with error:
| ``````
| ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333 {pgurl:1-4}
| ``````
Wraps: (10) exit status 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError
mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
| main.(*testRunner).runTest.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
Wraps: (4) monitor task failed
Wraps: (5) attached stack trace
-- stack trace:
| main.init
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
| runtime.doInit
| /usr/local/go/src/runtime/proc.go:6498
| runtime.main
| /usr/local/go/src/runtime/proc.go:238
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1581
Wraps: (6) t.Fatal() was called
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
Error: error in newOrder: ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh due to a conflict: intent on key /Table/166/1/669/0): "sql txn" meta={id=e741a752 key=/Table/168/1/669/8/0 pri=0.07912940 epo=17 ts=1642609332.165110310,2 min=1642609036.453450943,0 seq=20} lock=true stat=PENDING rts=1642609326.485867639,0 wto=false gul=1642609036.953450943,0 (SQLSTATE 40001)
It looks like a transaction retry error is somehow bubbling up to here: https://github.com/cockroachdb/cockroach/blob/79a4d4ad2295d6cf69083d93022d9cf49557c6fa/pkg/workload/tpcc/worker.go#L231-L234
The "last good" run before the failing streak is https://teamcity.cockroachdb.com/viewLog.html?buildId=4115910 ( d6b99e92bf55b6f4a0d79800d67924e04d0b2a6d) and the first failure in the streak 78419450178335b31f542bd1b14fefdf4ecee0e8.
$ git log --no-merges 78419450178335b31f542bd1b14fefdf4ecee0e8 --not d6b99e92bf55b6f4a0d79800d67924e04d0b2a6d --oneline
ca66a18fa4 execinfrapb: remove ScanVisibility
b37e13d74f sql: clean up unnamed struct in scanColumnsConfig
00912544a5 sql: remove privilege checks at scanNode init time
9dc76f064a sql: remove index flags logic from scanNode
0845c8a2cb sql: simplify scanColumnsConfig
5ac83d9070 sql: add regression tests inserting decimals in scientific notation
48f2808616 sql: don't check column visibility when initializing scanNode
1770c214f9 sql: remove unused scanColumnsConfig field
3afbdb0f50 sql: implement ON CONFLICT ON CONSTRAINT
2490224168 colexechash: combine two conditionals into one in distinct mode
6998af348e colexechash: remove some dead code
0bb31ff1dc colexectestutils: increase test coverage by randomizing batch length
bb2fc51a42 colexechash: cleanup the previous commit
13b4e48afe colexechash: fix an internal error with distinct mode
74b6e343ac tree,parser: add support for ON CONFLICT ON CONSTRAINT
b3877b8775 cdc: Allow webhook sink to provide client certificates to the remote webhook server
afb8dbe096 streampb: delete `stream.pb.go`
5c3e798c08 bazel: upgrade `rules_go` to pull in new changes
785af465ac sql,server: add VIEWACTIVITYREDACTED role
9653dd13ce build: add <release branch> to nightly and latest tag values
6664d0c34d kv: circuit-break requests to unavailable replicas
ad59351e4b echotest: add testing helper
055a55f52c authors: add natelong to authors
19d12a63e7 roachtest: update 22.1 version map to v21.2.4
7577c4e6df cloud: bump orchestrator to v21.2.4
Starting 3x b3877b8775 here: https://teamcity.cockroachdb.com/viewLog.html?buildId=4163457&buildTypeId=Cockroach_Nightlies_RoachtestStress&tab=buildResultsDiv&branch_Cockroach_Nightlies=%3Cdefault%3E
If this passes, then it's likely a SQL/colexec change that's to blame for this change of behavior.
cc @yuzefovich in case you have an immediate idea what could have changed in the propagation of txn retry errors.
roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ da01e4c0545f191a0573e1d097ff0366769e0d6b:
| 1376.0s 0 3.0 7.4 103079.2 103079.2 103079.2 103079.2 stockLevel
| 1377.0s 0 5.0 7.4 103079.2 103079.2 103079.2 103079.2 delivery
| 1377.0s 0 26.0 75.7 103079.2 103079.2 103079.2 103079.2 newOrder
| 1377.0s 0 1.0 7.5 42949.7 42949.7 42949.7 42949.7 orderStatus
| 1377.0s 0 18.0 74.4 103079.2 103079.2 103079.2 103079.2 payment
| 1377.0s 0 6.0 7.4 103079.2 103079.2 103079.2 103079.2 stockLevel
| 1378.0s 0 9.0 7.4 103079.2 103079.2 103079.2 103079.2 delivery
| 1378.0s 0 19.0 75.7 81604.4 103079.2 103079.2 103079.2 newOrder
| 1378.0s 0 2.0 7.5 159.4 45097.2 45097.2 45097.2 orderStatus
| 1378.0s 0 25.9 74.4 103079.2 103079.2 103079.2 103079.2 payment
| 1378.0s 0 6.0 7.4 103079.2 103079.2 103079.2 103079.2 stockLevel
Wraps: (8) COMMAND_PROBLEM
Wraps: (9) Node 5. Command with error:
| ``````
| ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333 {pgurl:1-4}
| ``````
Wraps: (10) exit status 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError
mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
| main.(*testRunner).runTest.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
Wraps: (4) monitor task failed
Wraps: (5) attached stack trace
-- stack trace:
| main.init
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
| runtime.doInit
| /usr/local/go/src/runtime/proc.go:6498
| runtime.main
| /usr/local/go/src/runtime/proc.go:238
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1581
Wraps: (6) t.Fatal() was called
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
I think it's most likely because of the streamer work (#68430) where we now use leaf txns to issue concurrent requests for index joins in some cases. Notably, I haven't yet implemented the transparent refresh mechanism there, so it's expected that the number of retryable errors increases because of that PR. I guess if we do SET CLUSTER SETTING sql.distsql.use_streamer.enabled = false;
, then these failures will go away.
Would you mind making that change? I think the streamer needs to be off by default if it can't properly propagate refresh errors. We're going to catch this in most workloads.
Just to make sure I understand things correctly: generally speaking, propagating a txn retryable error to the client is acceptable because the app must have some kind of retry loop; however, in most of our roachtests we don't tolerate the retryable errors and treat them as a failure of the test. Does this sound right?
The workload here handles retry errors (unless I'm misreading something about where the error occurs). I think what is happening here is that a retry error bubbles up as a regular error, i.e. it can't have had the proper type. Or at least that's what I think we're seeing? The error is returned from this method:
You can see by inspection that this implies that an error is returned from this block:
and that will certainly do proper retries?
So my reading was that something in code is doing some (probably less obviously wrong version of)
err := something() // retry err
err = errors.Errorf("oops messing it up %s", err)
return err
Hm, I'm confused. The Streamer
doesn't do anything with the errors other than calling GoError
: https://github.com/cockroachdb/cockroach/blob/ebda0ecb4aa1fe47f1403635846e342a2cfbfa1b/pkg/kv/kvclient/kvstreamer/streamer.go#L933
No wrapping / error modification is done on the newly-introduced TxnKVStreamer
either.
Trying to deconstruct the error message:
error in newOrder: ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn
error in newOrder
comes from
https://github.com/cockroachdb/cockroach/blob/79a4d4ad2295d6cf69083d93022d9cf49557c6fa/pkg/workload/tpcc/worker.go#L233
then ERROR
is likely because of pgerror.DefaultSeverity
being set in
https://github.com/cockroachdb/cockroach/blob/79a4d4ad2295d6cf69083d93022d9cf49557c6fa/pkg/sql/pgwire/pgerror/flatten.go#L44
then restart transaction
is
https://github.com/cockroachdb/cockroach/blob/79a4d4ad2295d6cf69083d93022d9cf49557c6fa/pkg/sql/pgwire/pgerror/flatten.go#L87
then TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn
probably is
https://github.com/cockroachdb/cockroach/blob/79a4d4ad2295d6cf69083d93022d9cf49557c6fa/pkg/kv/kvclient/kvcoord/txn_coord_sender.go#L791
Then because TransactionRetryWithProtoRefreshError
implements pgerror.ClientVisibleRetryError
, the error should have 40001
code which is then used to determine that the error is indeed retryable:
https://github.com/cockroachdb/cockroach-go/blob/7a4e30224f1a484982a53f29cd65eebba4d40b92/crdb/tx.go#L192
It does say "(SQLSTATE 40001)" in the error from newOrder
above. I think this really really means SQL "did everything right"? Flummoxed by what is going wrong here then.
Yeah, that's what puzzles me too.
I'll kick off this roachtest with the streamer disabled on #75257.
If we're looking for crackpot theories, could it be that we're getting the retry error on a BEGIN?
Lol I hope not.
Hm, all 5 builds failed. I think I kicked them off in a correct way (from https://github.com/cockroachdb/cockroach/tree/disable-streamer branch), so maybe it's not the streamer work after all to blame.
That looks correct. Ugh, another bisection.
roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 58ceac139a7e83052171121b28026a7366f16f7e:
| 1024.0s 0 7.0 9.5 85899.3 103079.2 103079.2 103079.2 delivery
| 1024.0s 0 31.0 96.0 103079.2 103079.2 103079.2 103079.2 newOrder
| 1024.0s 0 6.0 9.4 85899.3 103079.2 103079.2 103079.2 orderStatus
| 1024.0s 0 36.0 93.9 94489.3 103079.2 103079.2 103079.2 payment
| 1024.0s 0 6.0 9.4 66572.0 103079.2 103079.2 103079.2 stockLevel
| _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
| 1025.0s 0 4.0 9.5 11274.3 103079.2 103079.2 103079.2 delivery
| 1025.0s 0 33.0 96.0 103079.2 103079.2 103079.2 103079.2 newOrder
| 1025.0s 0 3.0 9.4 103079.2 103079.2 103079.2 103079.2 orderStatus
| 1025.0s 0 36.0 93.8 103079.2 103079.2 103079.2 103079.2 payment
| 1025.0s 0 4.0 9.4 38654.7 103079.2 103079.2 103079.2 stockLevel
Wraps: (8) COMMAND_PROBLEM
Wraps: (9) Node 5. Command with error:
| ``````
| ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333 {pgurl:1-4}
| ``````
Wraps: (10) exit status 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError
mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
| main.(*testRunner).runTest.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
Wraps: (4) monitor task failed
Wraps: (5) attached stack trace
-- stack trace:
| main.init
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
| runtime.doInit
| /usr/local/go/src/runtime/proc.go:6498
| runtime.main
| /usr/local/go/src/runtime/proc.go:238
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1581
Wraps: (6) t.Fatal() was called
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
FWIW it failed on b3877b8, to my surprise.
b3877b8775 cdc: Allow webhook sink to provide client certificates to the remote webhook server <-- bad
afb8dbe096 streampb: delete `stream.pb.go`
5c3e798c08 bazel: upgrade `rules_go` to pull in new changes
785af465ac sql,server: add VIEWACTIVITYREDACTED role
9653dd13ce build: add <release branch> to nightly and latest tag values
6664d0c34d kv: circuit-break requests to unavailable replicas
ad59351e4b echotest: add testing helper
055a55f52c authors: add natelong to authors
19d12a63e7 roachtest: update 22.1 version map to v21.2.4
7577c4e6df cloud: bump orchestrator to v21.2.4
<-- "good" (probably)
(wrong thread)
@cockroachdb/sql-experience could one of you folks take a look here? We're getting this error returned from crdbpgx.ExecuteTx
:
Error: error in newOrder: ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh due to a conflict: intent on key /Table/166/1/669/0): "sql txn" meta={id=e741a752 key=/Table/168/1/669/8/0 pri=0.07912940 epo=17 ts=1642609332.165110310,2 min=1642609036.453450943,0 seq=20} lock=true stat=PENDING rts=1642609326.485867639,0 wto=false gul=1642609036.953450943,0 (SQLSTATE 40001)
This seems to have the correct error code, how can this be bubbling up from the tpcc workload then?
roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ dc07599dc9db1acd5afa3a6537297815f25c1fca:
| 1277.0s 0 3.0 7.0 85899.3 103079.2 103079.2 103079.2 stockLevel
| 1278.0s 0 7.0 7.0 103079.2 103079.2 103079.2 103079.2 delivery
| 1278.0s 0 62.1 70.9 40802.2 103079.2 103079.2 103079.2 newOrder
| 1278.0s 0 7.0 7.1 66572.0 90194.3 90194.3 90194.3 orderStatus
| 1278.0s 0 65.1 69.7 57982.1 103079.2 103079.2 103079.2 payment
| 1278.0s 0 2.0 7.0 130.0 90194.3 90194.3 90194.3 stockLevel
| 1279.0s 0 2.0 7.0 85899.3 103079.2 103079.2 103079.2 delivery
| 1279.0s 0 60.0 70.9 68719.5 103079.2 103079.2 103079.2 newOrder
| 1279.0s 0 4.0 7.0 73014.4 103079.2 103079.2 103079.2 orderStatus
| 1279.0s 0 75.0 69.7 42949.7 103079.2 103079.2 103079.2 payment
| 1279.0s 0 8.0 7.0 49392.1 103079.2 103079.2 103079.2 stockLevel
Wraps: (8) COMMAND_PROBLEM
Wraps: (9) Node 5. Command with error:
| ``````
| ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333 {pgurl:1-4}
| ``````
Wraps: (10) exit status 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError
mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
| main.(*testRunner).runTest.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
Wraps: (4) monitor task failed
Wraps: (5) attached stack trace
-- stack trace:
| main.init
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
| runtime.doInit
| /usr/local/go/src/runtime/proc.go:6498
| runtime.main
| /usr/local/go/src/runtime/proc.go:238
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1581
Wraps: (6) t.Fatal() was called
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ e1068d77afbd39b162978281c9da7cbea49c1c3a:
| 1190.0s 0 3.0 8.2 27917.3 90194.3 90194.3 90194.3 stockLevel
| 1191.0s 0 4.0 8.1 23622.3 103079.2 103079.2 103079.2 delivery
| 1191.0s 0 46.0 83.0 64424.5 103079.2 103079.2 103079.2 newOrder
| 1191.0s 0 3.0 8.2 2952.8 103079.2 103079.2 103079.2 orderStatus
| 1191.0s 0 52.9 81.7 38654.7 103079.2 103079.2 103079.2 payment
| 1191.0s 0 2.0 8.2 29.4 81604.4 81604.4 81604.4 stockLevel
| 1192.0s 0 6.0 8.1 77309.4 103079.2 103079.2 103079.2 delivery
| 1192.0s 0 65.0 83.0 53687.1 103079.2 103079.2 103079.2 newOrder
| 1192.0s 0 5.0 8.2 3087.0 62277.0 62277.0 62277.0 orderStatus
| 1192.0s 0 44.0 81.7 32212.3 103079.2 103079.2 103079.2 payment
| 1192.0s 0 9.0 8.2 26843.5 103079.2 103079.2 103079.2 stockLevel
Wraps: (8) COMMAND_PROBLEM
Wraps: (9) Node 5. Command with error:
| ``````
| ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333 {pgurl:1-4}
| ``````
Wraps: (10) exit status 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError
mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
| main.(*testRunner).runTest.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
Wraps: (4) monitor task failed
Wraps: (5) attached stack trace
-- stack trace:
| main.init
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
| runtime.doInit
| /usr/local/go/src/runtime/proc.go:6498
| runtime.main
| /usr/local/go/src/runtime/proc.go:238
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1581
Wraps: (6) t.Fatal() was called
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 8cd28089c6c7333615ba3201e841839001d2f0e1:
| 1142.0s 0 2.0 7.6 62.9 103079.2 103079.2 103079.2 stockLevel
| 1143.0s 0 1.0 7.7 60129.5 60129.5 60129.5 60129.5 delivery
| 1143.0s 0 34.0 77.8 81604.4 103079.2 103079.2 103079.2 newOrder
| 1143.0s 0 5.0 7.6 42949.7 73014.4 73014.4 73014.4 orderStatus
| 1143.0s 0 25.0 76.4 53687.1 103079.2 103079.2 103079.2 payment
| 1143.0s 0 2.0 7.6 51539.6 103079.2 103079.2 103079.2 stockLevel
| 1144.0s 0 3.0 7.7 27917.3 103079.2 103079.2 103079.2 delivery
| 1144.0s 0 26.0 77.7 57982.1 103079.2 103079.2 103079.2 newOrder
| 1144.0s 0 1.0 7.6 302.0 302.0 302.0 302.0 orderStatus
| 1144.0s 0 29.9 76.4 66572.0 103079.2 103079.2 103079.2 payment
| 1144.0s 0 2.0 7.6 26843.5 40802.2 40802.2 40802.2 stockLevel
Wraps: (8) COMMAND_PROBLEM
Wraps: (9) Node 5. Command with error:
| ``````
| ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333 {pgurl:1-4}
| ``````
Wraps: (10) exit status 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError
mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
| main.(*testRunner).runTest.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
Wraps: (4) monitor task failed
Wraps: (5) attached stack trace
-- stack trace:
| main.init
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
| runtime.doInit
| /usr/local/go/src/runtime/proc.go:6498
| runtime.main
| /usr/local/go/src/runtime/proc.go:238
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1581
Wraps: (6) t.Fatal() was called
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
It could mean that it hit the max retry count and gave up. There was a bug that made the error print out a wrong message, so let me try upgrading to https://github.com/cockroachdb/cockroach-go/tree/v2.2.6 for that fix.
Ah, interesting. I had seen the max retries but remembered that I hit that in the past and that there was a clear error. Seems like a good thing to try - since the correct error code is logged, I have a hard time reasoning about what else it might be.
roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ c4c5ca2fdd5a641433a85a28d4dfd3bd4443015d:
| _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
| 1037.0s 0 0.0 6.3 0.0 0.0 0.0 0.0 delivery
| 1037.0s 0 19.0 66.0 103079.2 103079.2 103079.2 103079.2 newOrder
| 1037.0s 0 2.0 6.5 103079.2 103079.2 103079.2 103079.2 orderStatus
| 1037.0s 0 26.0 64.1 103079.2 103079.2 103079.2 103079.2 payment
| 1037.0s 0 4.0 6.5 40802.2 103079.2 103079.2 103079.2 stockLevel
| 1038.0s 0 1.0 6.3 103079.2 103079.2 103079.2 103079.2 delivery
| 1038.0s 0 25.0 66.0 103079.2 103079.2 103079.2 103079.2 newOrder
| 1038.0s 0 2.0 6.5 103079.2 103079.2 103079.2 103079.2 orderStatus
| 1038.0s 0 24.0 64.1 103079.2 103079.2 103079.2 103079.2 payment
| 1038.0s 0 2.0 6.5 103079.2 103079.2 103079.2 103079.2 stockLevel
Wraps: (8) COMMAND_PROBLEM
Wraps: (9) Node 5. Command with error:
| ``````
| ./cockroach workload run tpcc --warehouses=909 --histograms=perf/stats.json --ramp=5m0s --duration=2h0m0s --prometheus-port=2112 --pprofport=33333 {pgurl:1-4}
| ``````
Wraps: (10) exit status 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.withPrefix (7) *cluster.WithCommandDetails (8) errors.Cmd (9) *hintdetail.withDetail (10) *exec.ExitError
mixed_version_jobs.go:73,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
| main.(*testRunner).runTest.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
Wraps: (4) monitor task failed
Wraps: (5) attached stack trace
-- stack trace:
| main.init
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
| runtime.doInit
| /usr/local/go/src/runtime/proc.go:6498
| runtime.main
| /usr/local/go/src/runtime/proc.go:238
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1581
Wraps: (6) t.Fatal() was called
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ e5d1c374c31dc0e80a596c570da8dc45d73f80b8:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpcc/mixed-headroom/n5cpu16/run_1
monitor.go:127,versionupgrade.go:695,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor command failure: unexpected node event: 2: dead (exit status 137)
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.importLargeBankStep.func1
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:695
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:208
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:414
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func3
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:202
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor command failure
Wraps: (5) unexpected node event: 2: dead (exit status 137)
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
I think in the latest run, node 2 died. I don't know why.
16:46:06 test_impl.go:323: test failure: monitor.go:127,versionupgrade.go:695,versionupgrade.go:208,tpcc.go:414,test_runner.go:780: monitor failure: monitor command failure: unexpected node event: 2: dead (exit status 137)
All I see in node 2 is
cockroach exited with code 137: Wed Jan 26 16:46:06 UTC 2022
is that an OOM?
is that an OOM?
Yep:
[ 2299.275647] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/cockroach.service,task=cockroach,pid=14134,uid=1000
[ 2299.275767] Out of memory: Killed process 14134 (cockroach) total-vm:17495164kB, anon-rss:10913220kB, file-rss:41884kB, shmem-rss:0kB, UID:1000 pgtables:32044kB oom_score_adj:0
[ 2299.844048] oom_reaper: reaped process 14134 (cockroach), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
https://share.polarsignals.com/2946090/
I'm seeing elsewhere (https://github.com/cockroachdb/cockroach/issues/68303#issuecomment-1022959384) that we seem to have gotten really bad at distributing the load during IMPORT. Here, the OOM is during importLargeBankStep
, so an import too. But - there is no connection, because:
What complicates the situation here is that n2 is running the "old" version, and in fact so is the cluster & never has it run anything higher: v21.2.4
So this failure is strictly a property of the 21.2 branch. Going to assign to bulk-IO as such.
cc @cockroachdb/bulk-io
Is there any chance this is related to https://github.com/cockroachdb/cockroach/issues/76230. I don't see an oom there, but I don't see much of anything there.
roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on release-21.2 @ 31f167ca5bbe404abcb215f80524770ddc8c0163:
| I220514 14:21:42.147031 1 workload/tpcc/tpcc.go:509 [-] 1 check 3.3.2.1 took 257.678751ms
| I220514 14:21:54.673612 1 workload/tpcc/tpcc.go:509 [-] 2 check 3.3.2.2 took 12.526485234s
| I220514 14:21:57.515815 1 workload/tpcc/tpcc.go:509 [-] 3 check 3.3.2.3 took 2.842140408s
| I220514 14:25:35.024080 1 workload/tpcc/tpcc.go:509 [-] 4 check 3.3.2.4 took 3m37.508110259s
| I220514 14:25:42.163398 1 workload/tpcc/tpcc.go:509 [-] 5 check 3.3.2.5 took 7.138712008s
| Error: check failed: 3.3.2.5: pq: inbox communication error: rpc error: code = Canceled desc = context canceled
| Error: COMMAND_PROBLEM: exit status 1
| (1) COMMAND_PROBLEM
| Wraps: (2) Node 5. Command with error:
| | ``````
| | ./cockroach workload check tpcc --warehouses=909 {pgurl:1}
| | ``````
| Wraps: (3) exit status 1
| Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
|
| stdout:
Wraps: (4) exit status 20
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *exec.ExitError
mixed_version_jobs.go:73,versionupgrade.go:207,tpcc.go:444,test_runner.go:777: monitor failure: monitor task failed: t.Fatal() was called
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*backgroundStepper).wait
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/mixed_version_jobs.go:69
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*versionUpgradeTest).run
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/versionupgrade.go:207
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCC.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:444
| main.(*testRunner).runTest.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:777
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:172
Wraps: (4) monitor task failed
Wraps: (5) attached stack trace
-- stack trace:
| main.init
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:81
| runtime.doInit
| /usr/local/go/src/runtime/proc.go:6498
| runtime.main
| /usr/local/go/src/runtime/proc.go:238
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1581
Wraps: (6) t.Fatal() was called
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)
/cc @cockroachdb/kv-triage
This is a very old issue on a branch that is EOL.
roachtest.tpcc/mixed-headroom/n5cpu16 failed with artifacts on master @ 78419450178335b31f542bd1b14fefdf4ecee0e8:
Help
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
/cc @cockroachdb/kv-triage
This test on roachdash | Improve this report!
Jira issue: CRDB-12308