cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.06k stars 3.8k forks source link

roachtest: kv/splits/nodes=3/quiesce=true failed #68585

Closed cockroach-teamcity closed 3 years ago

cockroach-teamcity commented 3 years ago

roachtest.kv/splits/nodes=3/quiesce=true failed with artifacts on master @ 62ec88c61edcaa023a579199cc5b43d3ee951cef:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/kv/splits/nodes=3/quiesce=true/run_1
    cluster.go:1339,test_runner.go:881: operation "consistency check" timed out after 1m0s: driver: bad connection
        (1) operation "consistency check" timed out after 1m0s
        Wraps: (2) driver: bad connection
        Error types: (1) *contextutil.TimeoutError (2) *errors.errorString
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest) See: [CI job to stress roachtests](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress)

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^kv/splits/nodes=3/quiesce=true$` * Parameters / `env.COUNT`: <number of runs>

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!

erikgrinaker commented 3 years ago

Looks like the consistency check failed during test teardown due to a borked client connection. I'm guessing this is just a shutdown race or some such.

teardown: 06:59:10 cluster.go:1319: running (fast) consistency checks on node 1
teardown: 07:02:09 test_impl.go:323: test failure:  cluster.go:1339,test_runner.go:881: operation "consistency check" timed out after 1m0s: driver: bad connection
        (1) operation "consistency check" timed out after 1m0s
        Wraps: (2) driver: bad connection
        Error types: (1) *contextutil.TimeoutError (2) *errors.errorString

There's a secondary failure as we're trying to generate and collect disk usage stats from the workload node, as well as gathering logs. These both fail because there is no logs directory.

teardown: 07:02:10 test_runner.go:986: failed to fetch disk uage summary: output in run_070209.812381838_n1-4_du: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3275637-1628489663-14-n4cpu4:1-4 -- du -c /mnt/data1 --exclude lost+found >> logs/diskusage.txt returned: exit status 20
   4: ~ scp -r -C -o StrictHostKeyChecking=no -i /root/.ssh/id_rsa -i /root/.ssh/google_compute_engine ubuntu@34.138.199.151:logs /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/kv/splits/nodes=3/quiesce=true/run_1/logs/4.unredacted
scp: logs: No such file or directory: exit status 1
I210809 07:02:13.306959 1 (gostd) cluster_synced.go:1552  [-] 1  get logs failed
teardown: 07:02:13 cluster.go:1104: failed to fetch logs: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod get teamcity-3275637-1628489663-14-n4cpu4 logs /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/kv/splits/nodes=3/quiesce=true/run_1/logs/unredacted returned: exit status 1
erikgrinaker commented 3 years ago

Possibly caused by #68526, as also seen in #68574.

cockroach-teamcity commented 3 years ago

roachtest.kv/splits/nodes=3/quiesce=true failed with artifacts on master @ 847514dab6354d4cc4ccf7b2857487b32119fb37:

The test failed on branch=master, cloud=gce:
test artifacts and logs in: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/artifacts/kv/splits/nodes=3/quiesce=true/run_1
    cluster.go:1339,test_runner.go:881: operation "consistency check" timed out after 1m0s: driver: bad connection
        (1) operation "consistency check" timed out after 1m0s
        Wraps: (2) driver: bad connection
        Error types: (1) *contextutil.TimeoutError (2) *errors.errorString
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest) See: [CI job to stress roachtests](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress)

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^kv/splits/nodes=3/quiesce=true$` * Parameters / `env.COUNT`: <number of runs>

/cc @cockroachdb/kv-triage

This test on roachdash | Improve this report!