cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30k stars 3.79k forks source link

roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [self-delegated snaps] #72083

Closed cockroach-teamcity closed 2 years ago

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ d91fead28392841a943251842fbd43a0affb2eca:

          | main.(*monitorImpl).WaitE
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
          | main.(*monitorImpl).Wait
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1071
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:905
          | main.(*testRunner).runTest.func2
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:777
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1371
        Wraps: (2) monitor failure
        Wraps: (3) unexpected node event: 11: dead (exit status 137)
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

    cluster.go:1300,context.go:91,cluster.go:1288,test_runner.go:866: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-3647465-1635401487-34-n12cpu4-geo --oneshot --ignore-empty-nodes: exit status 1 4: skipped
        1: 13290
        3: 12804
        2: 13472
        8: skipped
        6: 11783
        12: skipped
        7: 11853
        11: dead (exit status 137)
        9: 11501
        5: 12334
        10: 11563
        Error: UNCLASSIFIED_PROBLEM: 11: dead (exit status 137)
        (1) UNCLASSIFIED_PROBLEM
        Wraps: (2) attached stack trace
          -- stack trace:
          | main.glob..func14
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1175
          | main.wrap.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:281
          | github.com/spf13/cobra.(*Command).execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
          | github.com/spf13/cobra.(*Command).ExecuteC
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960
          | github.com/spf13/cobra.(*Command).Execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | main.main
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:2104
          | runtime.main
          |     /usr/local/go/src/runtime/proc.go:225
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1371
        Wraps: (3) 11: dead (exit status 137)
        Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) | See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

Jira issue: CRDB-10940

AlexTalks commented 2 years ago

It is surprising that we are still seeing OOMs on this test despite merging #71132 - potentially related to #71802

tbg commented 2 years ago

image

https://share.polarsignals.com/73a06c8/

@erikgrinaker this seems to be something we should be looking into more actively. It is "sort of" expected that we're seeing lots of memory held up by sideloaded proposals; after all this phase of the test mostly crams lots of SSTs into our log and then asks us to send them to two followers, who are possibly also a region hop away. But something seems to have changed as we didn't use to see this and also #71132 hasn't prevented it from happening, and I looked before and couldn't see any obvious other leaks. So currently I am expecting that we will see that we happen to have a lot of groups catch up followers at once, overwhelming the system. If that is the case, it would be difficult to even think of a quick fix. We would need to either delay adding new entries to the log or sending entries to followers. The latter happens inside of raft, so the easier choice is the former. Then the question becomes, do we apply it to SSTs only, or to all proposals? SSTs is easier since there is already a concept of delaying them, plus they are not that sensitive to it. But first we need to see that what I'm describing is really what we're seeing.

erikgrinaker commented 2 years ago

Yeah, this seems bad. We seem to be enforcing per-range size limits that should mostly prevent this, so I agree that this seems likely to be because we're catching up many groups at once.

Would it be worth bisecting this to find out what triggered it?

tbg commented 2 years ago

Hard to say, it sure would be nice to know the commit if there is one. On the other hand, it would likely be extremely painful. I think I used to do hundreds of runs when working on https://github.com/cockroachdb/cockroach/issues/69414, though, and never saw the OOM there. This was based on ab1fc343c9a1140191f96353995258e609a84d02, so I think that would be our "good" commit (though it has the inconsistency). Now when did I first see this OOM? I think it was in https://github.com/cockroachdb/cockroach/issues/71050. Note that this isn't the exact same OOM (the memory is held in the inefficiency fixed in #71132) but I think this is still the same.

Hmm, maybe it's fine? Really depends on how clean the repro loop is. I think we should run import/tpcc/warehouses=4000/geo as tpccbench does lots of stuff not related to the import assuming it does get past the import. import/tpcc takes roundabout an hour so we should be able to see something. I might take this as the excuse to get https://github.com/cockroachdb/cockroach/pull/70435 back into shape and to see how far we can get.

tobias@td:~/go/src/github.com/cockroachdb/cockroach$ git bisect start
tobias@td:~/go/src/github.com/cockroachdb/cockroach$ git bisect good ab1fc34
tobias@td:~/go/src/github.com/cockroachdb/cockroach$ git bisect bad d1231cff60125b397ccce6c79c9aeea771cdcca4
Bisecting: 311 revisions left to test after this (roughly 8 steps)
warning: unable to rmdir 'pkg/ui/yarn-vendor': Directory not empty
Submodule path 'vendor': checked out 'fcef703fb087367037cfd20f9576875c2cec9092'
[ecffc89299760b8bf5f966030fd524475b4095ca] kv: deflake and unskip TestPushTxnUpgradeExistingTxn

edit: test balloon launched,

BRANCH=release-21.2 SHA=$(git rev-parse HEAD) TEST=import/tpcc/warehouses=4000/geo COUNT=1 ~/roachstress-ci.sh

https://teamcity.cockroachdb.com/viewLog.html?buildId=3683316&

tbg commented 2 years ago

Ok, the roachstress-CI thing seems to work. Going to log the bisect here and update as I make progress.

I'm using

BRANCH=release-21.2 SHA=$(git rev-parse HEAD) TEST=import/tpcc/warehouses=4000/geo COUNT=50 ~/roachstress-ci.sh

d1231cff60125b397ccce6c79c9aeea771cdcca4 (confirming starting bad commit): https://teamcity.cockroachdb.com/viewQueued.html?itemId=3683412, we expect this to produce the failure ab1fc343c9a1140191f96353995258e609a84d02 (confirming starting good commit): https://teamcity.cockroachdb.com/viewQueued.html?itemId=3683413, this should not produce the failure ecffc89299760b8bf5f966030fd524475b4095ca (bisect step 1): https://teamcity.cockroachdb.com/viewLog.html?buildId=3683411&

tbg commented 2 years ago

Hmm so stressing this test (import/tpcc/warehouses=4000/geo) worked great, the problem is all 50 runs passed on all three commits.

tbg commented 2 years ago

Screw it, going to try stressing tpccbench as is. I don't have it in me to patch each commit to just do the import, etc.; let's see what we get.

tbg commented 2 years ago

first bad commit: BRANCH=release-21.2 SHA=d1231cff60125b397ccce6c79c9aeea771cdcca4 TEST=import/tpcc/warehouses=4000/geo COUNT=50 ~/roachstress-ci.sh

tbg commented 2 years ago

oops that was the old test again. Ok here for reals:

first bad commit BRANCH=release-21.2 SHA=d1231cff60125b397ccce6c79c9aeea771cdcca4 TEST=tpccbench/nodes=9/cpu=4/multi-region COUNT=50 ~/roachstress-ci.sh

tbg commented 2 years ago

They all passed too. We were supposed to see an oom here.

erikgrinaker commented 2 years ago

Interesting, I suppose there must have been aggravating circumstances in the initial failure -- perhaps a failure mode that caused concurrent AddSSTable requests to pile up.

I had a look at the debug.zip, and noticed that we have several nodes with ~200 outbound snapshots in progress concurrently:

 $ grep 'kvserver.sendSnapshot' */stacks.txt | cut -f 1 -d / | uniq -c
      2 1
    165 4
    188 6
    195 7
    203 8

All of these appear to come via Replica.adminScatter. I'm speculating here, but seems plausible that if this amount of ranges were seeing concurrent AddSSTable traffic, then after the snapshots were applied we'd have to catch up ~200 ranges with AddSSTable entries. 3 GB / 200 ranges works out to about 15 MB/range, which is in the right ballpark.

tbg commented 2 years ago

Just for the record, if we wanted to limit the size of the messages, we'd have to work something down into raft onto this line

https://github.com/cockroachdb/vendored/blob/master/go.etcd.io/etcd/raft/v3/raft.go#L435

Instead of a fixed maxMsgSize we would need to pass an interface that dynamically limits the budget, i.e. something like

limiter interface {
  Request(size uint64) bool
}

and if the limiter returns false, we don't send anything else. The main new thing that comes out of this is that maybeSendAppend may end up sending nothing even though there is something that should be sent (in the current impl, it will send at least one entry in that case), not sure if that causes problems for any of the (few) callers. We'd also have to think about starvation. One very busy raft group may starve out another that is "just trying to send a single SST". So the underlying impl would have to "remember" a failed call on the assumption that the call will happen again soon. But we also need to figure out how wait until to try again. It's not entirely straightforward to set this all up.

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 2de17e7fbe66e14039fc7969a76139625761438f:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    cluster.go:1856,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3705715-1636529870-36-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
        (1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3705715-1636529870-36-n12cpu4-geo:1-3,5-7,9-11 returned
          | stderr:
          |
          | stdout:
          | <... some data truncated by circular buffer; go to artifacts for details ...>
          | nc9(0x1a19940, 0xc00052d980, 0x1, 0x2, 0xc00063fa10, 0xc00063fa38)
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:457 +0xdf
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !main.wrap.func1(0x1a19940, 0xc00052d980, 0x1, 0x2)
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1a19940, 0xc00052d960, 0x2, 0x2, 0x1a19940, 0xc00052d960)
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1a196c0, 0x0, 0x0, 0xc0000ea700)
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !main.main()
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170 +0x26a5
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !****************************************************************************
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !This node experienced a fatal error (printed above), and as a result the
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !process is terminating.
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !will try to determine the root cause, recommend next steps, and we can
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !improve CockroachDB based on your report.
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !Please submit a crash report by following the instructions here:
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !    https://github.com/cockroachdb/cockroach/issues/new/choose
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !If you would rather not post publicly, please contact us directly at:
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !    support@cockroachlabs.com
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211110 11:34:23.189100 1 roachprod/install/cluster_synced.go:1777  [-] 2 !The Cockroach Labs team appreciates your feedback.
        Wraps: (2) exit status 7
        Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) | See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

tbg commented 2 years ago

Last failure is [perm denied #72635]

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 4236daf8ac1494feab9193058517278c73bbdf27:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    cluster.go:1856,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3712075-1636615811-37-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
        (1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3712075-1636615811-37-n12cpu4-geo:1-3,5-7,9-11 returned
          | stderr:
          |
          | stdout:
          | <... some data truncated by circular buffer; go to artifacts for details ...>
          | nc9(0x1a1a940, 0xc00041d360, 0x1, 0x2, 0xc00063fa10, 0xc00063fa38)
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:457 +0xdf
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !main.wrap.func1(0x1a1a940, 0xc00041d360, 0x1, 0x2)
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1a1a940, 0xc00041d340, 0x2, 0x2, 0x1a1a940, 0xc00041d340)
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1a1a6c0, 0x0, 0x0, 0xc00056c700)
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !main.main()
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170 +0x26a5
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !****************************************************************************
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !This node experienced a fatal error (printed above), and as a result the
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !process is terminating.
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !will try to determine the root cause, recommend next steps, and we can
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !improve CockroachDB based on your report.
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !Please submit a crash report by following the instructions here:
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !    https://github.com/cockroachdb/cockroach/issues/new/choose
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !If you would rather not post publicly, please contact us directly at:
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !    support@cockroachlabs.com
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211111 11:35:20.567956 1 roachprod/install/cluster_synced.go:1777  [-] 2 !The Cockroach Labs team appreciates your feedback.
        Wraps: (2) exit status 7
        Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) | See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ d16a755cfa43e10a85e4c9aa9400b5a147b65e69:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    cluster.go:1856,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3721766-1636735213-36-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
        (1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3721766-1636735213-36-n12cpu4-geo:1-3,5-7,9-11 returned
          | stderr:
          |
          | stdout:
          | <... some data truncated by circular buffer; go to artifacts for details ...>
          | nc9(0x1acda60, 0xc000500840, 0x1, 0x2, 0xc0006bfa10, 0xc0006bfa38)
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:457 +0xdf
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !main.wrap.func1(0x1acda60, 0xc000500840, 0x1, 0x2)
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1acda60, 0xc000500820, 0x2, 0x2, 0x1acda60, 0xc000500820)
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1acd7e0, 0x0, 0x0, 0xc00017c700)
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !main.main()
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170 +0x26a5
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !****************************************************************************
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !This node experienced a fatal error (printed above), and as a result the
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !process is terminating.
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !will try to determine the root cause, recommend next steps, and we can
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !improve CockroachDB based on your report.
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !Please submit a crash report by following the instructions here:
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !    https://github.com/cockroachdb/cockroach/issues/new/choose
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !If you would rather not post publicly, please contact us directly at:
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !    support@cockroachlabs.com
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211112 20:53:41.063207 1 roachprod/install/cluster_synced.go:1777  [-] 2 !The Cockroach Labs team appreciates your feedback.
        Wraps: (2) exit status 7
        Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ f82ff534856738d5385073167d048feafb0b4f3e:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    cluster.go:1856,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3724941-1636787793-34-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
        (1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3724941-1636787793-34-n12cpu4-geo:1-3,5-7,9-11 returned
          | stderr:
          |
          | stdout:
          | <... some data truncated by circular buffer; go to artifacts for details ...>
          | nc9(0x1acda60, 0xc00049d920, 0x1, 0x2, 0xc00063fa10, 0xc00063fa38)
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:457 +0xdf
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !main.wrap.func1(0x1acda60, 0xc00049d920, 0x1, 0x2)
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1acda60, 0xc00049d900, 0x2, 0x2, 0x1acda60, 0xc00049d900)
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1acd7e0, 0x0, 0x0, 0xc0000d8700)
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !main.main()
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170 +0x26a5
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !****************************************************************************
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !This node experienced a fatal error (printed above), and as a result the
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !process is terminating.
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !will try to determine the root cause, recommend next steps, and we can
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !improve CockroachDB based on your report.
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !Please submit a crash report by following the instructions here:
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !    https://github.com/cockroachdb/cockroach/issues/new/choose
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !If you would rather not post publicly, please contact us directly at:
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !    support@cockroachlabs.com
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211113 11:23:59.305119 1 roachprod/install/cluster_synced.go:1777  [-] 2 !The Cockroach Labs team appreciates your feedback.
        Wraps: (2) exit status 7
        Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 13669f9c9bd92a4c3b0378a558d7735f122c4e72:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    cluster.go:1856,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3727706-1636874060-34-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
        (1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3727706-1636874060-34-n12cpu4-geo:1-3,5-7,9-11 returned
          | stderr:
          |
          | stdout:
          | <... some data truncated by circular buffer; go to artifacts for details ...>
          | nc9(0x1acda60, 0xc0005d0520, 0x1, 0x2, 0xc00035fa10, 0xc00035fa38)
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:457 +0xdf
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !main.wrap.func1(0x1acda60, 0xc0005d0520, 0x1, 0x2)
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1acda60, 0xc0005d0500, 0x2, 0x2, 0x1acda60, 0xc0005d0500)
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1acd7e0, 0x0, 0x0, 0xc0000e8700)
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !main.main()
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170 +0x26a5
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !****************************************************************************
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !This node experienced a fatal error (printed above), and as a result the
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !process is terminating.
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !will try to determine the root cause, recommend next steps, and we can
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !improve CockroachDB based on your report.
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !Please submit a crash report by following the instructions here:
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !    https://github.com/cockroachdb/cockroach/issues/new/choose
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !If you would rather not post publicly, please contact us directly at:
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !    support@cockroachlabs.com
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211114 11:24:37.514027 1 roachprod/install/cluster_synced.go:1777  [-] 2 !The Cockroach Labs team appreciates your feedback.
        Wraps: (2) exit status 7
        Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 3aeb3756887fcb35dcd19c0cee6894a143228727:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    cluster.go:1856,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3730576-1636963510-35-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
        (1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3730576-1636963510-35-n12cpu4-geo:1-3,5-7,9-11 returned
          | stderr:
          |
          | stdout:
          | <... some data truncated by circular buffer; go to artifacts for details ...>
          | nc9(0x1acda60, 0xc000507820, 0x1, 0x2, 0xc000651a10, 0xc000651a38)
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:457 +0xdf
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !main.wrap.func1(0x1acda60, 0xc000507820, 0x1, 0x2)
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1acda60, 0xc000507800, 0x2, 0x2, 0x1acda60, 0xc000507800)
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1acd7e0, 0x0, 0x0, 0xc00017c700)
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !main.main()
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1170 +0x26a5
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !****************************************************************************
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !This node experienced a fatal error (printed above), and as a result the
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !process is terminating.
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !will try to determine the root cause, recommend next steps, and we can
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !improve CockroachDB based on your report.
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !Please submit a crash report by following the instructions here:
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !    https://github.com/cockroachdb/cockroach/issues/new/choose
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !If you would rather not post publicly, please contact us directly at:
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !    support@cockroachlabs.com
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !
          | F211115 12:23:31.163508 1 roachprod/install/cluster_synced.go:1777  [-] 2 !The Cockroach Labs team appreciates your feedback.
        Wraps: (2) exit status 7
        Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ e921036c0640d833548363cd8f3fea78ae534bd1:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    cluster.go:1857,tpcc.go:1125,tpcc.go:1135,search.go:43,search.go:173,tpcc.go:1131,tpcc.go:905,test_runner.go:777: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3738783-1637046944-35-n12cpu4-geo:1-3,5-7,9-11 returned: exit status 7
        (1) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false teamcity-3738783-1637046944-35-n12cpu4-geo:1-3,5-7,9-11 returned
          | stderr:
          |
          | stdout:
          | <... some data truncated by circular buffer; go to artifacts for details ...>
          | nc9(0x1acca60, 0xc000423220, 0x1, 0x2, 0xc00063fa10, 0xc00063fa38)
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:461 +0xdf
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !main.wrap.func1(0x1acca60, 0xc000423220, 0x1, 0x2)
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:123 +0x6b
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !github.com/spf13/cobra.(*Command).execute(0x1acca60, 0xc000423200, 0x2, 0x2, 0x1acca60, 0xc000423200)
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856 +0x2c2
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !github.com/spf13/cobra.(*Command).ExecuteC(0x1acc7e0, 0x0, 0x0, 0xc0004f2700)
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960 +0x375
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !github.com/spf13/cobra.(*Command).Execute(...)
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !main.main()
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 ! /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1174 +0x26a5
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !****************************************************************************
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !This node experienced a fatal error (printed above), and as a result the
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !process is terminating.
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !Fatal errors can occur due to faulty hardware (disks, memory, clocks) or a
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !problem in CockroachDB. With your help, the support team at Cockroach Labs
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !will try to determine the root cause, recommend next steps, and we can
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !improve CockroachDB based on your report.
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !Please submit a crash report by following the instructions here:
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !    https://github.com/cockroachdb/cockroach/issues/new/choose
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !If you would rather not post publicly, please contact us directly at:
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !    support@cockroachlabs.com
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !
          | F211116 10:58:02.521271 1 roachprod/install/cluster_synced.go:1779  [-] 2 !The Cockroach Labs team appreciates your feedback.
        Wraps: (2) exit status 7
        Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 40f11fead0a0453969634f8ddb0502c1f78b2806:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ b450fea83a7db1e06403b2563c13f38c9284b932:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 3b30a0e12f9a14b08ee8ad55b50299aca50c67a2:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 2c014c47c1a242f504f6d595bfd79c0edc20b90a:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

This test on roachdash | Improve this report!

AlexTalks commented 2 years ago

Recent failures seem due to the GCE disk space issue mentioned in #73204, #73205, #73222, and #68965.

nvanbenschoten commented 2 years ago

So currently I am expecting that we will see that we happen to have a lot of groups catch up followers at once, overwhelming the system.

Drive-by comment, but one thing to note is that raft's two mechanisms to limit memory during its optimistic replication phase are MaxSizePerMsg and MaxInflightMsgs. We set these to 32KB and 128 msgs, respectively. So that should cap a single range at 4MB. However, the MaxSizePerMsg is not strict and can be exceeded for single entries that exceed the limit. We see sideloaded proposals that are around 8MB each, so even on a single range, we could grab 512MB of entries at a time.

And actually, that's per follower. Perhaps that has something to do with this. I can't recall whether this test uses the new multi-region abstractions. If it does, it will have non-voting replicas now, which could be increasing the replication fanout.

nvanbenschoten commented 2 years ago

After looking at the code, I don't think this test is using these abstractions yet. It's tough to trace down through the various layers though, so it would be worth confirming.

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 835ceca5d25d4a62233ddde4f493dbcf68302f1e:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:128,tpcc.go:1069,tpcc.go:905,test_runner.go:779: monitor failure: unexpected node event: 5: dead (exit status 137)
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
          | main.(*monitorImpl).Wait
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1069
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:905
          | main.(*testRunner).runTest.func2
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:779
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1581
        Wraps: (2) monitor failure
        Wraps: (3) unexpected node event: 5: dead (exit status 137)
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

    cluster.go:1339,context.go:91,cluster.go:1329,test_runner.go:867: dead node detection: 5: dead (exit status 137)
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ c4c5ca2fdd5a641433a85a28d4dfd3bd4443015d:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1077,tpcc.go:911,test_runner.go:780: monitor failure: monitor command failure: unexpected node event: 1: dead (exit status 137)
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1077
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:911
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func3
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:202
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor command failure
        Wraps: (5) unexpected node event: 1: dead (exit status 137)
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73675 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.1] - #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ bbb473c8f304ac20fec51ff0a0d04e128383bcf6:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1088,tpcc.go:922,test_runner.go:779: monitor failure: monitor command failure: unexpected node event: 6: dead (exit status 137)
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1088
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:922
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func3
          |     main/pkg/cmd/roachtest/monitor.go:202
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor command failure
        Wraps: (5) unexpected node event: 6: dead (exit status 137)
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73675 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot S-1 branch-release-21.1] - #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot S-1 branch-release-21.2]

This test on roachdash | Improve this report!

tbg commented 2 years ago

Hello old friend: https://share.polarsignals.com/1e07cc0/

image

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 91dfbc8e740ba5792834534dd41af6fb85cee721:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1088,tpcc.go:922,test_runner.go:779: monitor failure: monitor command failure: unexpected node event: 6: dead (exit status 137)
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1088
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:922
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func3
          |     main/pkg/cmd/roachtest/monitor.go:202
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor command failure
        Wraps: (5) unexpected node event: 6: dead (exit status 137)
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73675 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot S-1 branch-release-21.1] - #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot S-1 branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 29716850b181718594663889ddb5f479fef7a305:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    cluster.go:1868,tpcc.go:1063,tpcc.go:931,test_runner.go:875: one or more parallel execution failure
        (1) attached stack trace
          -- stack trace:
          | github.com/cockroachdb/cockroach/pkg/roachprod/install.(*SyncedCluster).ParallelE
          |     github.com/cockroachdb/cockroach/pkg/roachprod/install/cluster_synced.go:2042
          | github.com/cockroachdb/cockroach/pkg/roachprod/install.(*SyncedCluster).Parallel
          |     github.com/cockroachdb/cockroach/pkg/roachprod/install/cluster_synced.go:1923
          | github.com/cockroachdb/cockroach/pkg/roachprod/install.(*SyncedCluster).Start
          |     github.com/cockroachdb/cockroach/pkg/roachprod/install/cockroach.go:167
          | github.com/cockroachdb/cockroach/pkg/roachprod.Start
          |     github.com/cockroachdb/cockroach/pkg/roachprod/roachprod.go:660
          | main.(*clusterImpl).StartE
          |     main/pkg/cmd/roachtest/cluster.go:1826
          | main.(*clusterImpl).Start
          |     main/pkg/cmd/roachtest/cluster.go:1867
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1063
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | main.(*testRunner).runTest.func2
          |     main/pkg/cmd/roachtest/test_runner.go:875
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (2) one or more parallel execution failure
        Error types: (1) *withstack.withStack (2) *errutil.leafError
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73675 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot S-1 T-kv branch-release-21.1] - #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [sst raft oom] [C-test-failure O-roachtest O-robot S-1 T-kv branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 771432d1099e516dbc11827c5458886c176e73e3:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:875: monitor failure: monitor command failure: unexpected node event: 11: dead (exit status 134)
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func3
          |     main/pkg/cmd/roachtest/monitor.go:202
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor command failure
        Wraps: (5) unexpected node event: 11: dead (exit status 134)
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #71802 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [sst raft oom] [C-test-failure O-roachtest O-robot S-1 T-kv branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 97e72bacbeb9574f09f7475a62ef45c3e228183e:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ a2e1910f51593bd2ef72e1d7c615e08f95791186:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ c9e0194b19a03d55c6be92572aad3fbafc256334:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 7a9eb906ce86e2f75db637e29d46cd6604fca7b4:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 01572daaf94f80f81f843723a8b58d80fe128990:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1 release-blocker]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 587302906426907122deae6eae5f68630d57e900:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 98bdf3241028c9b1bdff429fb455e61870adc9d0:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ a86e6c9c4b82cb404e4a20fa70092823fd4a9439:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 60d0f95ba057dab8d4f50d2903c504fe275d061a:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 1a91d7cb7b93dfef5dcaf872125875cefa3e0190:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 7f3c06f5f2c26bc84705430a3622f92ec1444e9d:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ bc1ee7c7c276984fce8ff5ba4fcfcdff335dde50:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]

This test on roachdash | Improve this report!

tbg commented 2 years ago

Weird failure:

run_075600.757340555_n4_cockroach_workload_run_tpcc: 07:56:00 cluster.go:2012: running ./cockroach workload run tpcc --warehouses=3000 --workers=3000 --max-rate=490 --wait=false --ramp=15m0s --duration=45m0s --scatter --tolerate-errors {pgurl:1-3,5-7,9-11} on nodes: :4
teamcity-5148992-1652332775-35-n12cpu4-geo: ./cockroach workload run tp...
run_075600.757340555_n4_cockroach_workload_run_tpcc: 09:07:57 cluster.go:2027: > Error for Node 4: Non-zero exit code: 1

No nodes crashed. workload didn't output anything (see above). This seems to be happening daily, which is annoying.


Another weird thing - probably unrelated - it's not managing to get any CPU profiles in the debug.zip:

[cluster] profiles generated
[cluster] profile for node 1...
[cluster] profile for node 1: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 1: creating error output: debug/nodes/1/cpu.pprof.err.txt...
[cluster] profile for node 1: done
[cluster] profile for node 2...
[cluster] profile for node 2: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 2: creating error output: debug/nodes/2/cpu.pprof.err.txt...
[cluster] profile for node 2: done
[cluster] profile for node 3...
[cluster] profile for node 3: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 3: creating error output: debug/nodes/3/cpu.pprof.err.txt...
[cluster] profile for node 3: done
[cluster] profile for node 4...
[cluster] profile for node 4: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 4: creating error output: debug/nodes/4/cpu.pprof.err.txt...
[cluster] profile for node 4: done
[cluster] profile for node 5...
[cluster] profile for node 5: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 5: creating error output: debug/nodes/5/cpu.pprof.err.txt...
[cluster] profile for node 5: done
[cluster] profile for node 6...
[cluster] profile for node 6: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 6: creating error output: debug/nodes/6/cpu.pprof.err.txt...
[cluster] profile for node 6: done
[cluster] profile for node 7...
[cluster] profile for node 7: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 7: creating error output: debug/nodes/7/cpu.pprof.err.txt...
[cluster] profile for node 7: done
[cluster] profile for node 8...
[cluster] profile for node 8: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 8: creating error output: debug/nodes/8/cpu.pprof.err.txt...
[cluster] profile for node 8: done
[cluster] profile for node 9...
[cluster] profile for node 9: last request failed: rpc error: code = Unknown desc = a CPU profile is already in process, try again later
[cluster] profile for node 9: creating error output: debug/nodes/9/cpu.pprof.err.txt...
[cluster] profile for node 9: done

I thought this would be a bug where we'd accidentally request the profile 10x from the same node, but I checked the code and it seems fine. I thought that perhaps we were invoking debug.zip twice, but this isn't the case.

I checked an earlier failure and there we do see the profiles. (We don't need the profiles here but I have no explanation for the above).

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 9a179ea1aa8d6723fb12a988f2212ac8493e5dfc:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.tpccbench/nodes=9/cpu=4/multi-region failed with artifacts on master @ 16c484aa84d3718bfc82557f8e935ab78e6753b6:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /artifacts/tpccbench/nodes=9/cpu=4/multi-region/run_1
    monitor.go:127,tpcc.go:1096,tpcc.go:931,test_runner.go:876: monitor failure: monitor task failed: Non-zero exit code: 1
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     main/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     main/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runTPCCBench
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:1096
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerTPCCBenchSpec.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/tpcc.go:931
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     main/pkg/cmd/roachtest/monitor.go:171
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor task failed
        Wraps: (5) Non-zero exit code: 1
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *install.NonZeroExitCode
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #80856 roachtest: tpccbench/nodes=9/cpu=4/multi-region failed [C-test-failure O-roachtest O-robot T-bulkio T-kv branch-release-21.1]

This test on roachdash | Improve this report!

tbg commented 2 years ago

CI repro attempt with --debug on https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress/5169311?showRootCauses=true