Closed cockroach-teamcity closed 2 years ago
It is surprising that we are still seeing OOMs on this test despite merging #71132 - potentially related to #71802
https://share.polarsignals.com/73a06c8/
@erikgrinaker this seems to be something we should be looking into more actively. It is "sort of" expected that we're seeing lots of memory held up by sideloaded proposals; after all this phase of the test mostly crams lots of SSTs into our log and then asks us to send them to two followers, who are possibly also a region hop away. But something seems to have changed as we didn't use to see this and also #71132 hasn't prevented it from happening, and I looked before and couldn't see any obvious other leaks. So currently I am expecting that we will see that we happen to have a lot of groups catch up followers at once, overwhelming the system. If that is the case, it would be difficult to even think of a quick fix. We would need to either delay adding new entries to the log or sending entries to followers. The latter happens inside of raft, so the easier choice is the former. Then the question becomes, do we apply it to SSTs only, or to all proposals? SSTs is easier since there is already a concept of delaying them, plus they are not that sensitive to it. But first we need to see that what I'm describing is really what we're seeing.
Yeah, this seems bad. We seem to be enforcing per-range size limits that should mostly prevent this, so I agree that this seems likely to be because we're catching up many groups at once.
Would it be worth bisecting this to find out what triggered it?
Hard to say, it sure would be nice to know the commit if there is one. On the other hand, it would likely be extremely painful. I think I used to do hundreds of runs when working on https://github.com/cockroachdb/cockroach/issues/69414, though, and never saw the OOM there. This was based on ab1fc343c9a1140191f96353995258e609a84d02, so I think that would be our "good" commit (though it has the inconsistency). Now when did I first see this OOM? I think it was in https://github.com/cockroachdb/cockroach/issues/71050. Note that this isn't the exact same OOM (the memory is held in the inefficiency fixed in #71132) but I think this is still the same.
Hmm, maybe it's fine? Really depends on how clean the repro loop is. I think we should run import/tpcc/warehouses=4000/geo
as tpccbench does lots of stuff not related to the import assuming it does get past the import. import/tpcc
takes roundabout an hour so we should be able to see something. I might take this as the excuse to get https://github.com/cockroachdb/cockroach/pull/70435 back into shape and to see how far we can get.
tobias@td:~/go/src/github.com/cockroachdb/cockroach$ git bisect start
tobias@td:~/go/src/github.com/cockroachdb/cockroach$ git bisect good ab1fc34
tobias@td:~/go/src/github.com/cockroachdb/cockroach$ git bisect bad d1231cff60125b397ccce6c79c9aeea771cdcca4
Bisecting: 311 revisions left to test after this (roughly 8 steps)
warning: unable to rmdir 'pkg/ui/yarn-vendor': Directory not empty
Submodule path 'vendor': checked out 'fcef703fb087367037cfd20f9576875c2cec9092'
[ecffc89299760b8bf5f966030fd524475b4095ca] kv: deflake and unskip TestPushTxnUpgradeExistingTxn
edit: test balloon launched,
BRANCH=release-21.2 SHA=$(git rev-parse HEAD) TEST=import/tpcc/warehouses=4000/geo COUNT=1 ~/roachstress-ci.sh
https://teamcity.cockroachdb.com/viewLog.html?buildId=3683316&
Ok, the roachstress-CI thing seems to work. Going to log the bisect here and update as I make progress.
I'm using
BRANCH=release-21.2 SHA=$(git rev-parse HEAD) TEST=import/tpcc/warehouses=4000/geo COUNT=50 ~/roachstress-ci.sh
d1231cff60125b397ccce6c79c9aeea771cdcca4 (confirming starting bad commit): https://teamcity.cockroachdb.com/viewQueued.html?itemId=3683412, we expect this to produce the failure ab1fc343c9a1140191f96353995258e609a84d02 (confirming starting good commit): https://teamcity.cockroachdb.com/viewQueued.html?itemId=3683413, this should not produce the failure ecffc89299760b8bf5f966030fd524475b4095ca (bisect step 1): https://teamcity.cockroachdb.com/viewLog.html?buildId=3683411&
Hmm so stressing this test (import/tpcc/warehouses=4000/geo
) worked great, the problem is all 50 runs passed on all three commits.
Screw it, going to try stressing tpccbench as is. I don't have it in me to patch each commit to just do the import, etc.; let's see what we get.
oops that was the old test again. Ok here for reals:
first bad commit BRANCH=release-21.2 SHA=d1231cff60125b397ccce6c79c9aeea771cdcca4 TEST=tpccbench/nodes=9/cpu=4/multi-region COUNT=50 ~/roachstress-ci.sh
They all passed too. We were supposed to see an oom here.
Interesting, I suppose there must have been aggravating circumstances in the initial failure -- perhaps a failure mode that caused concurrent AddSSTable
requests to pile up.
I had a look at the debug.zip, and noticed that we have several nodes with ~200 outbound snapshots in progress concurrently:
$ grep 'kvserver.sendSnapshot' */stacks.txt | cut -f 1 -d / | uniq -c
2 1
165 4
188 6
195 7
203 8
All of these appear to come via Replica.adminScatter
. I'm speculating here, but seems plausible that if this amount of ranges were seeing concurrent AddSSTable
traffic, then after the snapshots were applied we'd have to catch up ~200 ranges with AddSSTable
entries. 3 GB / 200 ranges works out to about 15 MB/range, which is in the right ballpark.
Just for the record, if we wanted to limit the size of the messages, we'd have to work something down into raft onto this line
https://github.com/cockroachdb/vendored/blob/master/go.etcd.io/etcd/raft/v3/raft.go#L435
Instead of a fixed maxMsgSize we would need to pass an interface that dynamically limits the budget, i.e. something like
limiter interface {
Request(size uint64) bool
}
and if the limiter returns false, we don't send anything else. The main new thing that comes out of this is that maybeSendAppend
may end up sending nothing even though there is something that should be sent (in the current impl, it will send at least one entry in that case), not sure if that causes problems for any of the (few) callers. We'd also have to think about starvation. One very busy raft group may starve out another that is "just trying to send a single SST". So the underlying impl would have to "remember" a failed call on the assumption that the call will happen again soon. But we also need to figure out how wait until to try again. It's not entirely straightforward to set this all up.
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 2de17e7fbe66e14039fc7969a76139625761438f:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
cluster.go:1856,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3705715-1636529870-36-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
(1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3705715-1636529870-36-n12cpu4-geo:1-3,5-7,9-11 returned
| stderr:
|
| stdout:
| <... some data truncated by circular buffer; go to artifacts for details ...>
| nc9(0x1a19940, 0xc00052d980, 0x1, 0x2, 0xc00063fa10, 0xc00063fa38)
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:457 +0xdf
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !main.wrap.func1(0x1a19940, 0xc00052d980, 0x1, 0x2)
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1a19940, 0xc00052d960, 0x2, 0x2, 0x1a19940, 0xc00052d960)
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1a196c0, 0x0, 0x0, 0xc0000ea700)
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !main.main()
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170 +0x26a5
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !****************************************************************************
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !This node experienced a fatal error (printed above), and as a result the
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !process is terminating.
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !will try to determine the root cause, recommend next steps, and we can
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !improve CockroachDB based on your report.
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !Please submit a crash report by following the instructions here:
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! https://github.com/cockroachdb/cockroach/issues/new/choose
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !If you would rather not post publicly, please contact us directly at:
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! support@cockroachlabs.com
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777 [-] 2 !The Cockroach Labs team appreciates your feedback.
Wraps: (2) exit status 7
Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) | See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
/cc @cockroachdb/kv-triage
Last failure is [perm denied #72635]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 4236daf8ac1494feab9193058517278c73bbdf27:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
cluster.go:1856,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3712075-1636615811-37-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
(1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3712075-1636615811-37-n12cpu4-geo:1-3,5-7,9-11 returned
| stderr:
|
| stdout:
| <... some data truncated by circular buffer; go to artifacts for details ...>
| nc9(0x1a1a940, 0xc00041d360, 0x1, 0x2, 0xc00063fa10, 0xc00063fa38)
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:457 +0xdf
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !main.wrap.func1(0x1a1a940, 0xc00041d360, 0x1, 0x2)
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1a1a940, 0xc00041d340, 0x2, 0x2, 0x1a1a940, 0xc00041d340)
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1a1a6c0, 0x0, 0x0, 0xc00056c700)
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !main.main()
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170 +0x26a5
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !****************************************************************************
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !This node experienced a fatal error (printed above), and as a result the
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !process is terminating.
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !will try to determine the root cause, recommend next steps, and we can
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !improve CockroachDB based on your report.
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !Please submit a crash report by following the instructions here:
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! https://github.com/cockroachdb/cockroach/issues/new/choose
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !If you would rather not post publicly, please contact us directly at:
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! support@cockroachlabs.com
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777 [-] 2 !The Cockroach Labs team appreciates your feedback.
Wraps: (2) exit status 7
Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) | See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
/cc @cockroachdb/kv-triage
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ d16a755cfa43e10a85e4c9aa9400b5a147b65e69:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
cluster.go:1856,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3721766-1636735213-36-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
(1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3721766-1636735213-36-n12cpu4-geo:1-3,5-7,9-11 returned
| stderr:
|
| stdout:
| <... some data truncated by circular buffer; go to artifacts for details ...>
| nc9(0x1acda60, 0xc000500840, 0x1, 0x2, 0xc0006bfa10, 0xc0006bfa38)
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:457 +0xdf
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !main.wrap.func1(0x1acda60, 0xc000500840, 0x1, 0x2)
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1acda60, 0xc000500820, 0x2, 0x2, 0x1acda60, 0xc000500820)
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1acd7e0, 0x0, 0x0, 0xc00017c700)
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !main.main()
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170 +0x26a5
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !****************************************************************************
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !This node experienced a fatal error (printed above), and as a result the
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !process is terminating.
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !will try to determine the root cause, recommend next steps, and we can
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !improve CockroachDB based on your report.
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !Please submit a crash report by following the instructions here:
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! https://github.com/cockroachdb/cockroach/issues/new/choose
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !If you would rather not post publicly, please contact us directly at:
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! support@cockroachlabs.com
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777 [-] 2 !The Cockroach Labs team appreciates your feedback.
Wraps: (2) exit status 7
Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ f82ff534856738d5385073167d048feafb0b4f3e:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
cluster.go:1856,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3724941-1636787793-34-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
(1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3724941-1636787793-34-n12cpu4-geo:1-3,5-7,9-11 returned
| stderr:
|
| stdout:
| <... some data truncated by circular buffer; go to artifacts for details ...>
| nc9(0x1acda60, 0xc00049d920, 0x1, 0x2, 0xc00063fa10, 0xc00063fa38)
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:457 +0xdf
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !main.wrap.func1(0x1acda60, 0xc00049d920, 0x1, 0x2)
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1acda60, 0xc00049d900, 0x2, 0x2, 0x1acda60, 0xc00049d900)
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1acd7e0, 0x0, 0x0, 0xc0000d8700)
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !main.main()
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170 +0x26a5
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !****************************************************************************
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !This node experienced a fatal error (printed above), and as a result the
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !process is terminating.
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !will try to determine the root cause, recommend next steps, and we can
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !improve CockroachDB based on your report.
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !Please submit a crash report by following the instructions here:
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! https://github.com/cockroachdb/cockroach/issues/new/choose
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !If you would rather not post publicly, please contact us directly at:
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! support@cockroachlabs.com
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777 [-] 2 !The Cockroach Labs team appreciates your feedback.
Wraps: (2) exit status 7
Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 13669f9c9bd92a4c3b0378a558d7735f122c4e72:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
cluster.go:1856,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3727706-1636874060-34-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
(1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3727706-1636874060-34-n12cpu4-geo:1-3,5-7,9-11 returned
| stderr:
|
| stdout:
| <... some data truncated by circular buffer; go to artifacts for details ...>
| nc9(0x1acda60, 0xc0005d0520, 0x1, 0x2, 0xc00035fa10, 0xc00035fa38)
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:457 +0xdf
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !main.wrap.func1(0x1acda60, 0xc0005d0520, 0x1, 0x2)
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1acda60, 0xc0005d0500, 0x2, 0x2, 0x1acda60, 0xc0005d0500)
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1acd7e0, 0x0, 0x0, 0xc0000e8700)
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !main.main()
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170 +0x26a5
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !****************************************************************************
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !This node experienced a fatal error (printed above), and as a result the
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !process is terminating.
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !will try to determine the root cause, recommend next steps, and we can
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !improve CockroachDB based on your report.
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !Please submit a crash report by following the instructions here:
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! https://github.com/cockroachdb/cockroach/issues/new/choose
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !If you would rather not post publicly, please contact us directly at:
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! support@cockroachlabs.com
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777 [-] 2 !The Cockroach Labs team appreciates your feedback.
Wraps: (2) exit status 7
Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 3aeb3756887fcb35dcd19c0cee6894a143228727:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
cluster.go:1856,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3730576-1636963510-35-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
(1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3730576-1636963510-35-n12cpu4-geo:1-3,5-7,9-11 returned
| stderr:
|
| stdout:
| <... some data truncated by circular buffer; go to artifacts for details ...>
| nc9(0x1acda60, 0xc000507820, 0x1, 0x2, 0xc000651a10, 0xc000651a38)
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:457 +0xdf
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !main.wrap.func1(0x1acda60, 0xc000507820, 0x1, 0x2)
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1acda60, 0xc000507800, 0x2, 0x2, 0x1acda60, 0xc000507800)
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1acd7e0, 0x0, 0x0, 0xc00017c700)
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !main.main()
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170 +0x26a5
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !****************************************************************************
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !This node experienced a fatal error (printed above), and as a result the
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !process is terminating.
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !will try to determine the root cause, recommend next steps, and we can
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !improve CockroachDB based on your report.
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !Please submit a crash report by following the instructions here:
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! https://github.com/cockroachdb/cockroach/issues/new/choose
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !If you would rather not post publicly, please contact us directly at:
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 ! support@cockroachlabs.com
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !
| F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777 [-] 2 !The Cockroach Labs team appreciates your feedback.
Wraps: (2) exit status 7
Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ e921036c0640d833548363cd8f3fea78ae534bd1:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
cluster.go:1857,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3738783-1637046944-35-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
(1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3738783-1637046944-35-n12cpu4-geo:1-3,5-7,9-11 returned
| stderr:
|
| stdout:
| <... some data truncated by circular buffer; go to artifacts for details ...>
| nc9(0x1acca60, 0xc000423220, 0x1, 0x2, 0xc00063fa10, 0xc00063fa38)
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:461 +0xdf
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !main.wrap.func1(0x1acca60, 0xc000423220, 0x1, 0x2)
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1acca60, 0xc000423200, 0x2, 0x2, 0x1acca60, 0xc000423200)
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1acc7e0, 0x0, 0x0, 0xc0004f2700)
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !main.main()
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1174 +0x26a5
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !****************************************************************************
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !This node experienced a fatal error (printed above), and as a result the
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !process is terminating.
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !will try to determine the root cause, recommend next steps, and we can
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !improve CockroachDB based on your report.
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !Please submit a crash report by following the instructions here:
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 ! https://github.com/cockroachdb/cockroach/issues/new/choose
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !If you would rather not post publicly, please contact us directly at:
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 ! support@cockroachlabs.com
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !
| F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779 [-] 2 !The Cockroach Labs team appreciates your feedback.
Wraps: (2) exit status 7
Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 40f11fead0a0453969634f8ddb0502c1f78b2806:
The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ b450fea83a7db1e06403b2563c13f38c9284b932:
The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 3b30a0e12f9a14b08ee8ad55b50299aca50c67a2:
The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 2c014c47c1a242f504f6d595bfd79c0edc20b90a:
The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
Recent failures seem due to the GCE disk space issue mentioned in #73204, #73205, #73222, and #68965.
So currently I am expecting that we will see that we happen to have a lot of groups catch up followers at once, overwhelming the system.
Drive-by comment, but one thing to note is that raft's two mechanisms to limit memory during its optimistic replication phase are MaxSizePerMsg
and MaxInflightMsgs
. We set these to 32KB and 128 msgs, respectively. So that should cap a single range at 4MB. However, the MaxSizePerMsg
is not strict and can be exceeded for single entries that exceed the limit. We see sideloaded proposals that are around 8MB each, so even on a single range, we could grab 512MB of entries at a time.
And actually, that's per follower. Perhaps that has something to do with this. I can't recall whether this test uses the new multi-region abstractions. If it does, it will have non-voting replicas now, which could be increasing the replication fanout.
After looking at the code, I don't think this test is using these abstractions yet. It's tough to trace down through the various layers though, so it would be worth confirming.
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 835ceca5d25d4a62233ddde4f493dbcf68302f1e:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:128,tpcc.go:1069,tpcc.go:905,test_runner.go:779: monitor failure: unexpected node event: 5: dead (exit status 137)
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
| main.(*monitorImpl).Wait
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1069
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:905
| main.(*testRunner).runTest.func2
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:779
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1581
Wraps: (2) monitor failure
Wraps: (3) unexpected node event: 5: dead (exit status 137)
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString
cluster.go:1339,context.go:91,cluster.go:1329,test_runner.go:867: dead node detection: 5: dead (exit status 137)
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ c4c5ca2fdd5a641433a85a28d4dfd3bd4443015d:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1077,tpcc.go:911,test_runner.go:780: monitor failure: monitor command failure: unexpected node event: 1: dead (exit status 137)
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1077
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:911
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func3
| /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:202
| runtime.goexit
| /usr/local/go/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor command failure
Wraps: (5) unexpected node event: 1: dead (exit status 137)
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #73675 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.1] - #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ bbb473c8f304ac20fec51ff0a0d04e128383bcf6:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1088,tpcc.go:922,test_runner.go:779: monitor failure: monitor command failure: unexpected node event: 6: dead (exit status 137)
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1088
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:922
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func3
| main/pkg/cmd/roachtest/monitor.go:202
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor command failure
Wraps: (5) unexpected node event: 6: dead (exit status 137)
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #73675 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot S-1 branch-release-21.1] - #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot S-1 branch-release-21.2]
Hello old friend: https://share.polarsignals.com/1e07cc0/
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 91dfbc8e740ba5792834534dd41af6fb85cee721:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1088,tpcc.go:922,test_runner.go:779: monitor failure: monitor command failure: unexpected node event: 6: dead (exit status 137)
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1088
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:922
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func3
| main/pkg/cmd/roachtest/monitor.go:202
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor command failure
Wraps: (5) unexpected node event: 6: dead (exit status 137)
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #73675 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot S-1 branch-release-21.1] - #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot S-1 branch-release-21.2]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 29716850b181718594663889ddb5f479fef7a305:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
cluster.go:1868,tpcc.go:1063,tpcc.go:931,test_runner.go:875: one or more parallel execution failure
(1) attached stack trace
-- stack trace:
| github.com/cockroachdb/cockroach/pkg/roachprod/install.(*SyncedCluster).ParallelE
| github.com/cockroachdb/cockroach/pkg/roachprod/install/cluster_synced.go:2042
| github.com/cockroachdb/cockroach/pkg/roachprod/install.(*SyncedCluster).Parallel
| github.com/cockroachdb/cockroach/pkg/roachprod/install/cluster_synced.go:1923
| github.com/cockroachdb/cockroach/pkg/roachprod/install.(*SyncedCluster).Start
| github.com/cockroachdb/cockroach/pkg/roachprod/install/cockroach.go:167
| github.com/cockroachdb/cockroach/pkg/roachprod.Start
| github.com/cockroachdb/cockroach/pkg/roachprod/roachprod.go:660
| main.(*clusterImpl).StartE
| main/pkg/cmd/roachtest/cluster.go:1826
| main.(*clusterImpl).Start
| main/pkg/cmd/roachtest/cluster.go:1867
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1063
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| main.(*testRunner).runTest.func2
| main/pkg/cmd/roachtest/test_runner.go:875
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (2) one or more parallel execution failure
Error types: (1) *withstack.withStack (2) *errutil.leafError
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #73675 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot S-1 T-kv branch-release-21.1] - #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [sst raft oom] [C-test-failure O-roachtest O-robot S-1 T-kv branch-release-21.2]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 771432d1099e516dbc11827c5458886c176e73e3:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:875: monitor failure: monitor command failure: unexpected node event: 11: dead (exit status 134)
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func3
| main/pkg/cmd/roachtest/monitor.go:202
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor command failure
Wraps: (5) unexpected node event: 11: dead (exit status 134)
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [sst raft oom] [C-test-failure O-roachtest O-robot S-1 T-kv branch-release-21.2]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 97e72bacbeb9574f09f7475a62ef45c3e228183e:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ a2e1910f51593bd2ef72e1d7c615e08f95791186:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ c9e0194b19a03d55c6be92572aad3fbafc256334:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 7a9eb906ce86e2f75db637e29d46cd6604fca7b4:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 01572daaf94f80f81f843723a8b58d80fe128990:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1 release-blocker]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 587302906426907122deae6eae5f68630d57e900:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 98bdf3241028c9b1bdff429fb455e61870adc9d0:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ a86e6c9c4b82cb404e4a20fa70092823fd4a9439:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 60d0f95ba057dab8d4f50d2903c504fe275d061a:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 1a91d7cb7b93dfef5dcaf872125875cefa3e0190:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 7f3c06f5f2c26bc84705430a3622f92ec1444e9d:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ bc1ee7c7c276984fce8ff5ba4fcfcdff335dde50:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]
Weird failure:
run_075600.757340555_n4_cockroach_workload_run_tpcc: 07:56:00 cluster.go:2012: running ./cockroach workload run tpcc --warehouses=3000 --workers=3000 --max-rate=490 --wait=false --ramp=15m0s --duration=45m0s --scatter --tolerate-errors {pgurl:1-3,5-7,9-11} on nodes: :4
teamcity-5148992-1652332775-35-n12cpu4-geo: ./cockroach workload run tp...
run_075600.757340555_n4_cockroach_workload_run_tpcc: 09:07:57 cluster.go:2027: > Error for Node 4: Non-zero exit code: 1
No nodes crashed. workload
didn't output anything (see above). This seems to be happening daily, which is annoying.
Another weird thing - probably unrelated - it's not managing to get any CPU profiles in the debug.zip:
[cluster] profiles generated
[cluster] profile for node 1...
[cluster] profile for node 1: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 1: creating error output: debug/nodes/1/cpu.pprof.err.txt...
[cluster] profile for node 1: done
[cluster] profile for node 2...
[cluster] profile for node 2: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 2: creating error output: debug/nodes/2/cpu.pprof.err.txt...
[cluster] profile for node 2: done
[cluster] profile for node 3...
[cluster] profile for node 3: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 3: creating error output: debug/nodes/3/cpu.pprof.err.txt...
[cluster] profile for node 3: done
[cluster] profile for node 4...
[cluster] profile for node 4: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 4: creating error output: debug/nodes/4/cpu.pprof.err.txt...
[cluster] profile for node 4: done
[cluster] profile for node 5...
[cluster] profile for node 5: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 5: creating error output: debug/nodes/5/cpu.pprof.err.txt...
[cluster] profile for node 5: done
[cluster] profile for node 6...
[cluster] profile for node 6: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 6: creating error output: debug/nodes/6/cpu.pprof.err.txt...
[cluster] profile for node 6: done
[cluster] profile for node 7...
[cluster] profile for node 7: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 7: creating error output: debug/nodes/7/cpu.pprof.err.txt...
[cluster] profile for node 7: done
[cluster] profile for node 8...
[cluster] profile for node 8: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 8: creating error output: debug/nodes/8/cpu.pprof.err.txt...
[cluster] profile for node 8: done
[cluster] profile for node 9...
[cluster] profile for node 9: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 9: creating error output: debug/nodes/9/cpu.pprof.err.txt...
[cluster] profile for node 9: done
I thought this would be a bug where we'd accidentally request the profile 10x from the same node, but I checked the code and it seems fine. I thought that perhaps we were invoking debug.zip
twice, but this isn't the case.
I checked an earlier failure and there we do see the profiles. (We don't need the profiles here but I have no explanation for the above).
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 9a179ea1aa8d6723fb12a988f2212ac8493e5dfc:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 16c484aa84d3718bfc82557f8e935ab78e6753b6:
The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
(1) attached stack trace
-- stack trace:
| main.(*monitorImpl).WaitE
| main/pkg/cmd/roachtest/monitor.go:115
| main.(*monitorImpl).Wait
| main/pkg/cmd/roachtest/monitor.go:123
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
| [...repeated from below...]
Wraps: (2) monitor failure
Wraps: (3) attached stack trace
-- stack trace:
| main.(*monitorImpl).wait.func2
| main/pkg/cmd/roachtest/monitor.go:171
| runtime.goexit
| GOROOT/src/runtime/asm_amd64.s:1581
Wraps: (4) monitor task failed
Wraps: (5) Non-zero exit code: 1
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]
CI repro attempt with --debug
on https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress/5169311?showRootCauses=true
roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ d91fead28392841a943251842fbd43a0affb2eca:
Help
See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) | See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)
Same failure on other branches
- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]
/cc @cockroachdb/kv-triage
This test on roachdash | Improve this report!
Jira issue: CRDB-10940