cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.2k stars 3.82k forks source link

roachtest: tpccbench/nodes=9/cpu=4/chaos/partition failed #37923

Closed cockroach-teamcity closed 5 years ago

cockroach-teamcity commented 5 years ago

SHA: https://github.com/cockroachdb/cockroach/commits/61715f0f96f519d599eec6541bbee7394d63209a

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/chaos/partition PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1312952&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
    cluster.go:1482,tpcc.go:715,search.go:47,search.go:177,tpcc.go:711,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod stop teamcity-1312952-tpccbench-nodes-9-cpu-4-chaos-partition:1-9 returned:
        stderr:

        stdout:
        etained in /root/.roachprod/debug/ssh_34.74.202.225_2019-05-29T17:55:55Z
        github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*remoteSession).errWithDebug
            /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/session.go:82
        github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*remoteSession).CombinedOutput
            /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/session.go:91
        github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Stop.func1
            /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:196
        github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1
            /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1467
        runtime.goexit
            /usr/local/go/src/runtime/asm_amd64.s:1333: Connection to 34.74.202.225 closed by remote host.

        I190529 17:57:55.939062 1 cluster_synced.go:1549  command failed
        : exit status 1
    cluster.go:1875,tpcc.go:828,tpcc.go:554,test.go:1251: Goexit() was called
    cluster.go:1038,context.go:89,cluster.go:1027,asm_amd64.s:522,panic.go:397,test.go:788,test.go:774,cluster.go:1875,tpcc.go:828,tpcc.go:554,test.go:1251: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1312952-tpccbench-nodes-9-cpu-4-chaos-partition --oneshot --ignore-empty-nodes: exit status 1 10: skipped
        6: dead
        5: dead
        7: dead
        9: dead
        3: dead
        2: dead
        1: dead
        4: dead
        8: 16301
        Error:  6: dead, 5: dead, 7: dead, 9: dead, 3: dead, 2: dead, 1: dead, 4: dead
cockroach-teamcity commented 5 years ago

SHA: https://github.com/cockroachdb/cockroach/commits/83e62d69214aaa0f7b976f764b97b0e21a41cde3

Parameters:

To repro, try:

# Don't forget to check out a clean suitable branch and experiment with the
# stress invocation until the desired results present themselves. For example,
# using stress instead of stressrace and passing the '-p' stressflag which
# controls concurrency.
./scripts/gceworker.sh start && ./scripts/gceworker.sh mosh
cd ~/go/src/github.com/cockroachdb/cockroach && \
stdbuf -oL -eL \
make stressrace TESTS=tpccbench/nodes=9/cpu=4/chaos/partition PKG=roachtest TESTTIMEOUT=5m STRESSFLAGS='-maxtime 20m -timeout 10m' 2>&1 | tee /tmp/stress.log

Failed test: https://teamcity.cockroachdb.com/viewLog.html?buildId=1318703&tab=buildLog

The test failed on branch=release-19.1, cloud=gce:
    cluster.go:1442,tpcc.go:716,search.go:47,search.go:177,tpcc.go:711,cluster.go:1854,errgroup.go:57: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod start --encrypt=false --racks=3 teamcity-1318703-tpccbench-nodes-9-cpu-4-chaos-partition:1-9 returned:
        stderr:

        stdout:
        thub.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1
            /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1467
        runtime.goexit
            /usr/local/go/src/runtime/asm_amd64.s:1333
        ~ ./cockroach version
        Connection to 35.231.210.155 closed by remote host.

        github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.getCockroachVersion
            /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:95
        github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func7
            /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:289
        github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).Parallel.func1.1
            /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1467
        runtime.goexit
            /usr/local/go/src/runtime/asm_amd64.s:1333: 
        I190601 18:33:43.632572 1 cluster_synced.go:1549  command failed
        : exit status 1
    cluster.go:1875,tpcc.go:828,tpcc.go:554,test.go:1251: Goexit() was called
    cluster.go:1038,context.go:89,cluster.go:1027,asm_amd64.s:522,panic.go:397,test.go:788,test.go:774,cluster.go:1875,tpcc.go:828,tpcc.go:554,test.go:1251: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-1318703-tpccbench-nodes-9-cpu-4-chaos-partition --oneshot --ignore-empty-nodes: exit status 1 10: skipped
        8: 13198
        6: 13468
        7: 13266
        1: 14167
        9: dead
        3: 13140
        4: 14510
        2: 13083
        5: 14318
        Error:  9: dead
tbg commented 5 years ago

likely ssh flukes in roachprod start, roachprod stop, respectively.

tbg commented 5 years ago

cc #36929