cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

roachtest: jepsen/monotonic/parts-start-kill-2 failed #92566

Closed cockroach-teamcity closed 1 year ago

cockroach-teamcity commented 1 year ago

roachtest.jepsen/monotonic/parts-start-kill-2 failed with artifacts on release-22.2.0 @ 77667a1b0101cd323090011f50cf910aaa933654:

        (1) attached stack trace
          -- stack trace:
          | main.(*clusterImpl).RunE
          |     main/pkg/cmd/roachtest/cluster.go:2018
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runJepsen.func1
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/jepsen.go:171
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runJepsen.func3
          |     github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/jepsen.go:209
          | runtime.goexit
          |     GOROOT/src/runtime/asm_amd64.s:1594
        Wraps: (2) output in run_125205.508345942_n6_bash
        Wraps: (3) bash -e -c "\
          | cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && \
          |  ~/lein run test \
          |    --tarball file://${PWD}/cockroach.tgz \
          |    --username ${USER} \
          |    --ssh-private-key ~/.ssh/id_rsa \
          |    --os ubuntu \
          |    --time-limit 300 \
          |    --concurrency 30 \
          |    --recovery-time 25 \
          |    --test-count 1 \
          |    -n 10.142.0.224 -n 10.142.1.73 -n 10.142.1.82 -n 10.142.1.212 -n 10.142.1.211 \
          |    --test monotonic --nemesis parts --nemesis2 start-kill-2 \
          | > invoke.log 2>&1 \
          | " returned
          | stderr:
          |
          | stdout:
        Wraps: (4) SSH_PROBLEM
        Wraps: (5) Node 6. Command with error:
          | ``````
          | bash -e -c "\
          | cd /mnt/data1/jepsen/cockroachdb && set -eo pipefail && \
          |  ~/lein run test \
          |    --tarball file://${PWD}/cockroach.tgz \
          |    --username ${USER} \
          |    --ssh-private-key ~/.ssh/id_rsa \
          |    --os ubuntu \
          |    --time-limit 300 \
          |    --concurrency 30 \
          |    --recovery-time 25 \
          |    --test-count 1 \
          |    -n 10.142.0.224 -n 10.142.1.73 -n 10.142.1.82 -n 10.142.1.212 -n 10.142.1.211 \
          |    --test monotonic --nemesis parts --nemesis2 start-kill-2 \
          | > invoke.log 2>&1 \
          | "
          | ``````
        Wraps: (6) exit status 255
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) errors.SSH (5) *hintdetail.withDetail (6) *exec.ExitError

Parameters: ROACHTEST_cloud=gce , ROACHTEST_cpu=4 , ROACHTEST_encrypted=false , ROACHTEST_fs=ext4 , ROACHTEST_localSSD=true , ROACHTEST_ssd=0

Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-21869

kvoli commented 1 year ago

infra flake: ssh issue before test began.

" returned: SSH_PROBLEM: exit status 255
13:14:05 jepsen.go:256: grabbing artifacts from controller. Tail of controller log:
13:14:06 cluster.go:2062: > tar -chj --ignore-failed-read -C /mnt/data1/jepsen/cockroachdb -f- store/latest invoke.log
13:14:07 jepsen.go:295: downloaded jepsen logs in failure-logs.tbz
13:14:07 test_impl.go:323: test failure:    jepsen.go:300,jepsen.go:361,test_runner.go:930: output in run_131347.126257192_n6_bash: bash -e -c "\: SSH_PROBLEM: exit status 255