cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

roachtest: clearrange/zfs/checks=true failed #68303

Closed cockroach-teamcity closed 2 years ago

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 701b177d8f4b81d8654dfb4090a2cd3cf82e63a7:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest)

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ eef03a46f2e43ff70485dadf7d9ad445db05cab4:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest) See: [CI job to stress roachtests](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress)

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^clearrange/zfs/checks=true$` * Parameters / `env.COUNT`: <number of runs>

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 6b8d59327add74cf1342345fb3eaffc3a3e765d2:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest) See: [CI job to stress roachtests](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress)

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^clearrange/zfs/checks=true$` * Parameters / `env.COUNT`: <number of runs>

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 50ef2fc205baa65c5a740c2d614fe1de279367e9:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest) See: [CI job to stress roachtests](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress)

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^clearrange/zfs/checks=true$` * Parameters / `env.COUNT`: <number of runs>

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ cab185ff71f0924953d987fe6ffd14efdd32a3a0:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest) See: [CI job to stress roachtests](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress)

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^clearrange/zfs/checks=true$` * Parameters / `env.COUNT`: <number of runs>

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 847514dab6354d4cc4ccf7b2857487b32119fb37:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest) See: [CI job to stress roachtests](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress)

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^clearrange/zfs/checks=true$` * Parameters / `env.COUNT`: <number of runs>

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

bananabrick commented 3 years ago

These are failing during the "import" workload sporadically. Looking into it.

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 90809c048d05f923a67ce9b89597b2779fc73e32:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest) See: [CI job to stress roachtests](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress)

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^clearrange/zfs/checks=true$` * Parameters / `env.COUNT`: <number of runs>

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 0880e83e30ee5eb9aab7bb2297324e098d028225:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest) See: [CI job to stress roachtests](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress)

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^clearrange/zfs/checks=true$` * Parameters / `env.COUNT`: <number of runs>

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 7897f24246bef3cb94f9f4bfaed474ecaa9fdee6:

          | (1) attached stack trace
          |   -- stack trace:
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*crdbInstallHelper).startNode
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:412
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:166
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).ParallelE.func1.1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1709
          |   | runtime.goexit
          |   |     /usr/local/go/src/runtime/asm_amd64.s:1371
          | Wraps: (2) ~ ./cockroach.sh
          |   | Job for cockroach.service failed because the control process exited with error code.
          |   | See "systemctl status cockroach.service" and "journalctl -xe" for details.
          | Wraps: (3) exit status 1
          | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *exec.ExitError: 
          | 8: ~ ./cockroach.sh: exit status 1
          | (1) attached stack trace
          |   -- stack trace:
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*crdbInstallHelper).startNode
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:412
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:166
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).ParallelE.func1.1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1709
          |   | runtime.goexit
          |   |     /usr/local/go/src/runtime/asm_amd64.s:1371
          | Wraps: (2) ~ ./cockroach.sh
          |   | Job for cockroach.service failed because the control process exited with error code.
          |   | See "systemctl status cockroach.service" and "journalctl -xe" for details.
          | Wraps: (3) exit status 1
          | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *exec.ExitError: 
          | 9: ~ ./cockroach.sh: exit status 1
          | (1) attached stack trace
          |   -- stack trace:
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*crdbInstallHelper).startNode
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:412
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:166
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).ParallelE.func1.1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1709
          |   | runtime.goexit
          |   |     /usr/local/go/src/runtime/asm_amd64.s:1371
          | Wraps: (2) ~ ./cockroach.sh
          |   | Job for cockroach.service failed because the control process exited with error code.
          |   | See "systemctl status cockroach.service" and "journalctl -xe" for details.
          | Wraps: (3) exit status 1
          | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *exec.ExitError: 
          | I210820 08:22:08.352991 1 (gostd) cluster_synced.go:1677  [-] 1  command failed
        Wraps: (2) exit status 1
        Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest) See: [CI job to stress roachtests](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress)

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^clearrange/zfs/checks=true$` * Parameters / `env.COUNT`: <number of runs>

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 11e0a4da82124e70e772a009011ca7a4007bff85:

          | (1) attached stack trace
          |   -- stack trace:
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*crdbInstallHelper).startNode
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:412
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:166
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).ParallelE.func1.1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1709
          |   | runtime.goexit
          |   |     /usr/local/go/src/runtime/asm_amd64.s:1371
          | Wraps: (2) ~ ./cockroach.sh
          |   | Job for cockroach.service failed because the control process exited with error code.
          |   | See "systemctl status cockroach.service" and "journalctl -xe" for details.
          | Wraps: (3) exit status 1
          | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *exec.ExitError: 
          | 8: ~ ./cockroach.sh: exit status 1
          | (1) attached stack trace
          |   -- stack trace:
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*crdbInstallHelper).startNode
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:412
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:166
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).ParallelE.func1.1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1709
          |   | runtime.goexit
          |   |     /usr/local/go/src/runtime/asm_amd64.s:1371
          | Wraps: (2) ~ ./cockroach.sh
          |   | Job for cockroach.service failed because the control process exited with error code.
          |   | See "systemctl status cockroach.service" and "journalctl -xe" for details.
          | Wraps: (3) exit status 1
          | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *exec.ExitError: 
          | 9: ~ ./cockroach.sh: exit status 1
          | (1) attached stack trace
          |   -- stack trace:
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*crdbInstallHelper).startNode
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:412
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:166
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).ParallelE.func1.1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1709
          |   | runtime.goexit
          |   |     /usr/local/go/src/runtime/asm_amd64.s:1371
          | Wraps: (2) ~ ./cockroach.sh
          |   | Job for cockroach.service failed because the control process exited with error code.
          |   | See "systemctl status cockroach.service" and "journalctl -xe" for details.
          | Wraps: (3) exit status 1
          | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *exec.ExitError: 
          | I210821 08:06:06.679612 1 (gostd) cluster_synced.go:1677  [-] 1  command failed
        Wraps: (2) exit status 1
        Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest) See: [CI job to stress roachtests](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress)

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^clearrange/zfs/checks=true$` * Parameters / `env.COUNT`: <number of runs>

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ d18da6c092bf1522e7a6478fe3973817e318c247:

          | (1) attached stack trace
          |   -- stack trace:
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*crdbInstallHelper).startNode
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:412
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:166
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).ParallelE.func1.1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1709
          |   | runtime.goexit
          |   |     /usr/local/go/src/runtime/asm_amd64.s:1371
          | Wraps: (2) ~ ./cockroach.sh
          |   | Job for cockroach.service failed because the control process exited with error code.
          |   | See "systemctl status cockroach.service" and "journalctl -xe" for details.
          | Wraps: (3) exit status 1
          | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *exec.ExitError: 
          | 8: ~ ./cockroach.sh: exit status 1
          | (1) attached stack trace
          |   -- stack trace:
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*crdbInstallHelper).startNode
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:412
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:166
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).ParallelE.func1.1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1709
          |   | runtime.goexit
          |   |     /usr/local/go/src/runtime/asm_amd64.s:1371
          | Wraps: (2) ~ ./cockroach.sh
          |   | Job for cockroach.service failed because the control process exited with error code.
          |   | See "systemctl status cockroach.service" and "journalctl -xe" for details.
          | Wraps: (3) exit status 1
          | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *exec.ExitError: 
          | 9: ~ ./cockroach.sh: exit status 1
          | (1) attached stack trace
          |   -- stack trace:
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*crdbInstallHelper).startNode
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:412
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:166
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).ParallelE.func1.1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1709
          |   | runtime.goexit
          |   |     /usr/local/go/src/runtime/asm_amd64.s:1371
          | Wraps: (2) ~ ./cockroach.sh
          |   | Job for cockroach.service failed because the control process exited with error code.
          |   | See "systemctl status cockroach.service" and "journalctl -xe" for details.
          | Wraps: (3) exit status 1
          | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *exec.ExitError: 
          | I210822 08:36:00.123979 1 (gostd) cluster_synced.go:1677  [-] 1  command failed
        Wraps: (2) exit status 1
        Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest) See: [CI job to stress roachtests](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress)

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^clearrange/zfs/checks=true$` * Parameters / `env.COUNT`: <number of runs>

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 61bd543ba7288c8f0eed6cddded7b219c9d1fcd4:

          | (1) attached stack trace
          |   -- stack trace:
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*crdbInstallHelper).startNode
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:412
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:166
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).ParallelE.func1.1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1709
          |   | runtime.goexit
          |   |     /usr/local/go/src/runtime/asm_amd64.s:1371
          | Wraps: (2) ~ ./cockroach.sh
          |   | Job for cockroach.service failed because the control process exited with error code.
          |   | See "systemctl status cockroach.service" and "journalctl -xe" for details.
          | Wraps: (3) exit status 1
          | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *exec.ExitError: 
          | 8: ~ ./cockroach.sh: exit status 1
          | (1) attached stack trace
          |   -- stack trace:
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*crdbInstallHelper).startNode
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:412
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:166
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).ParallelE.func1.1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1709
          |   | runtime.goexit
          |   |     /usr/local/go/src/runtime/asm_amd64.s:1371
          | Wraps: (2) ~ ./cockroach.sh
          |   | Job for cockroach.service failed because the control process exited with error code.
          |   | See "systemctl status cockroach.service" and "journalctl -xe" for details.
          | Wraps: (3) exit status 1
          | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *exec.ExitError: 
          | 9: ~ ./cockroach.sh: exit status 1
          | (1) attached stack trace
          |   -- stack trace:
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*crdbInstallHelper).startNode
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:412
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.Cockroach.Start.func1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cockroach.go:166
          |   | github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install.(*SyncedCluster).ParallelE.func1.1
          |   |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/install/cluster_synced.go:1709
          |   | runtime.goexit
          |   |     /usr/local/go/src/runtime/asm_amd64.s:1371
          | Wraps: (2) ~ ./cockroach.sh
          |   | Job for cockroach.service failed because the control process exited with error code.
          |   | See "systemctl status cockroach.service" and "journalctl -xe" for details.
          | Wraps: (3) exit status 1
          | Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *exec.ExitError: 
          | I210823 08:17:14.885081 1 (gostd) cluster_synced.go:1677  [-] 1  command failed
        Wraps: (2) exit status 1
        Error types: (1) *cluster.WithCommandDetails (2) *exec.ExitError
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/tree/master/pkg/cmd/roachtest) See: [CI job to stress roachtests](https://teamcity.cockroachdb.com/buildConfiguration/Cockroach_Nightlies_RoachtestStress)

For the CI stress job, click the ellipsis (...) next to the Run button and fill in: * Changes / Build branch: master * Parameters / `env.TESTS`: `^clearrange/zfs/checks=true$` * Parameters / `env.COUNT`: <number of runs>

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/checks=true failed with artifacts on master @ 8cae60f603ccc4d83137167b3b31cab09be9d41a:

          |  1358.0s        0         3557.8         5106.1      5.0     32.5     62.9    159.4 write
          |  1359.0s        0         3963.4         5105.3      5.0     24.1     79.7    121.6 write
          |  1360.0s        0         4129.4         5104.6      5.0     24.1     54.5     92.3 write
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |  1361.0s        0          953.5         5101.5      5.0     26.2     50.3     83.9 write
          |  1362.0s        0            0.0         5097.8      0.0      0.0      0.0      0.0 write
          |  1363.0s        0            0.0         5094.0      0.0      0.0      0.0      0.0 write
          |  1364.0s        0            0.0         5090.3      0.0      0.0      0.0      0.0 write
          |  1365.0s        0            0.0         5086.6      0.0      0.0      0.0      0.0 write
          |  1366.0s        0            0.0         5082.8      0.0      0.0      0.0      0.0 write
          |  1367.0s        0            0.0         5079.1      0.0      0.0      0.0      0.0 write
          |  1368.0s        0            0.0         5075.4      0.0      0.0      0.0      0.0 write
          |  1369.0s        0            0.0         5071.7      0.0      0.0      0.0      0.0 write
          |  1370.0s        0            0.0         5068.0      0.0      0.0      0.0      0.0 write
          | Error: unexpected EOF
          | COMMAND_PROBLEM: exit status 1
        Wraps: (4) exit status 20
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *exec.ExitError

    monitor.go:128,clearrange.go:207,clearrange.go:38,test_runner.go:777: monitor failure: monitor task failed: t.Fatal() was called
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
          | main.(*monitorImpl).Wait
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runClearRange
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:207
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerClearRange.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:38
          | main.(*testRunner).runTest.func2
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:777
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:172
        Wraps: (4) monitor task failed
        Wraps: (5) attached stack trace
          -- stack trace:
          | main.init
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:81
          | runtime.doInit
          |     /usr/local/go/src/runtime/proc.go:6309
          | runtime.main
          |     /usr/local/go/src/runtime/proc.go:208
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1371
        Wraps: (6) t.Fatal() was called
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

Same failure on other branches

- #65092 roachtest: clearrange/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-20.1 release-blocker]

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 8cae60f603ccc4d83137167b3b31cab09be9d41a:

        Wraps: (2) output in run_090816.531265009_n1_cockroach_workload_fixtures_import_bank
        Wraps: (3) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3366874-1630046181-41-n10cpu16:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned
          | stderr:
          | I210827 09:08:17.597020 1 ccl/workloadccl/fixture.go:345  [-] 1  starting import of 1 tables
          | Error: importing fixture: importing table bank: dial tcp 127.0.0.1:26257: connect: connection refused
          | Error: COMMAND_PROBLEM: exit status 1
          | (1) COMMAND_PROBLEM
          | Wraps: (2) Node 1. Command with error:
          |   | ``````
          |   | ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank
          |   | ``````
          | Wraps: (3) exit status 1
          | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
          |
          | stdout:
        Wraps: (4) exit status 20
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *exec.ExitError

    cluster.go:1249,context.go:89,cluster.go:1237,test_runner.go:866: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-3366874-1630046181-41-n10cpu16 --oneshot --ignore-empty-nodes: exit status 1 1: dead (exit status 137)
        10: 13246
        4: 13917
        5: 14053
        2: 13877
        7: 13704
        8: 13512
        9: 13895
        3: 13794
        6: 13959
        Error: UNCLASSIFIED_PROBLEM: 1: dead (exit status 137)
        (1) UNCLASSIFIED_PROBLEM
        Wraps: (2) attached stack trace
          -- stack trace:
          | main.glob..func14
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1173
          | main.wrap.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:281
          | github.com/spf13/cobra.(*Command).execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
          | github.com/spf13/cobra.(*Command).ExecuteC
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960
          | github.com/spf13/cobra.(*Command).Execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | main.main
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:2107
          | runtime.main
          |     /usr/local/go/src/runtime/proc.go:225
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1371
        Wraps: (3) 1: dead (exit status 137)
        Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 44ea1fa0eba8fc78544700ef4afded62ab98a021:

        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:172
        Wraps: (4) monitor task failed
        Wraps: (5) attached stack trace
          -- stack trace:
          | main.init
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:81
          | runtime.doInit
          |     /usr/local/go/src/runtime/proc.go:6309
          | runtime.main
          |     /usr/local/go/src/runtime/proc.go:208
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1371
        Wraps: (6) t.Fatal() was called
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

    cluster.go:1249,context.go:89,cluster.go:1237,test_runner.go:866: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-3373578-1630131756-44-n10cpu16 --oneshot --ignore-empty-nodes: exit status 1 5: dead (exit status 134)
        6: 1381369
        10: 1433487
        1: 1027658
        8: 942369
        7: 1337138
        2: 1209474
        3: 1737446
        4: 1008459
        9: 1325433
        Error: UNCLASSIFIED_PROBLEM: 5: dead (exit status 134)
        (1) UNCLASSIFIED_PROBLEM
        Wraps: (2) attached stack trace
          -- stack trace:
          | main.glob..func14
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1173
          | main.wrap.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:281
          | github.com/spf13/cobra.(*Command).execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
          | github.com/spf13/cobra.(*Command).ExecuteC
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960
          | github.com/spf13/cobra.(*Command).Execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | main.main
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:2107
          | runtime.main
          |     /usr/local/go/src/runtime/proc.go:225
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1371
        Wraps: (3) 5: dead (exit status 134)
        Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 0b57dc40deda1206d9a1c215ffdb219bbf182a39:

        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:172
        Wraps: (4) monitor task failed
        Wraps: (5) attached stack trace
          -- stack trace:
          | main.init
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:81
          | runtime.doInit
          |     /usr/local/go/src/runtime/proc.go:6309
          | runtime.main
          |     /usr/local/go/src/runtime/proc.go:208
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1371
        Wraps: (6) t.Fatal() was called
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

    cluster.go:1249,context.go:89,cluster.go:1237,test_runner.go:866: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-3380119-1630304219-45-n10cpu16 --oneshot --ignore-empty-nodes: exit status 1 1: 1458935
        5: 1129317
        3: 1155285
        7: 1713730
        8: 1288440
        6: 1139808
        2: dead (exit status 134)
        10: 1327500
        9: 1181349
        4: 958629
        Error: UNCLASSIFIED_PROBLEM: 2: dead (exit status 134)
        (1) UNCLASSIFIED_PROBLEM
        Wraps: (2) attached stack trace
          -- stack trace:
          | main.glob..func14
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1173
          | main.wrap.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:281
          | github.com/spf13/cobra.(*Command).execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
          | github.com/spf13/cobra.(*Command).ExecuteC
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960
          | github.com/spf13/cobra.(*Command).Execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | main.main
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:2107
          | runtime.main
          |     /usr/local/go/src/runtime/proc.go:225
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1371
        Wraps: (3) 2: dead (exit status 134)
        Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/checks=true failed with artifacts on master @ c1ef81f5f435b3cc5bdf8b218532e0779f03a6bf:

          |  1595.0s        0         2227.4         3986.7      6.8     17.8    159.4    209.7 write
          |  1596.0s        0         3167.1         3986.2      6.8     19.9    117.4    570.4 write
          |  1597.0s        0         3393.2         3985.8      6.8     26.2     56.6    142.6 write
          |  1598.0s        0         1478.8         3984.2      6.0     39.8    159.4    201.3 write
          |  1599.0s        0            0.0         3981.8      0.0      0.0      0.0      0.0 write
          |  1600.0s        0            0.0         3979.3      0.0      0.0      0.0      0.0 write
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |  1601.0s        0            0.0         3976.8      0.0      0.0      0.0      0.0 write
          |  1602.0s        0            0.0         3974.3      0.0      0.0      0.0      0.0 write
          |  1603.0s        0            0.0         3971.8      0.0      0.0      0.0      0.0 write
          |  1604.0s        0            0.0         3969.3      0.0      0.0      0.0      0.0 write
          |  1605.0s        0            0.0         3966.9      0.0      0.0      0.0      0.0 write
          |  1606.0s        0            0.0         3964.4      0.0      0.0      0.0      0.0 write
          |  1607.0s        0            0.0         3961.9      0.0      0.0      0.0      0.0 write
          | Error: ERROR: result is ambiguous (error=rpc error: code = Unavailable desc = error reading from server: read tcp 10.142.0.151:49272->10.142.0.148:26257: read: connection reset by peer [propagate]) (SQLSTATE 40003)
          | COMMAND_PROBLEM: exit status 1
        Wraps: (4) exit status 20
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *exec.ExitError

    monitor.go:128,clearrange.go:207,clearrange.go:38,test_runner.go:777: monitor failure: monitor task failed: t.Fatal() was called
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
          | main.(*monitorImpl).Wait
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runClearRange
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:207
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerClearRange.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:38
          | main.(*testRunner).runTest.func2
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:777
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:172
        Wraps: (4) monitor task failed
        Wraps: (5) attached stack trace
          -- stack trace:
          | main.init
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:81
          | runtime.doInit
          |     /usr/local/go/src/runtime/proc.go:6309
          | runtime.main
          |     /usr/local/go/src/runtime/proc.go:208
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1371
        Wraps: (6) t.Fatal() was called
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

Same failure on other branches

- #65092 roachtest: clearrange/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-20.1 release-blocker]

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 15b773c71f92d643795e34c922717fde0447f9cd:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 42e5f9492d0d8d93638241303bca984fe78baae3:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 941011c4e582807b40dd03bbcbb8d05385c0638d:

        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:172
        Wraps: (4) monitor task failed
        Wraps: (5) attached stack trace
          -- stack trace:
          | main.init
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:81
          | runtime.doInit
          |     /usr/local/go/src/runtime/proc.go:6309
          | runtime.main
          |     /usr/local/go/src/runtime/proc.go:208
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1371
        Wraps: (6) t.Fatal() was called
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError

    cluster.go:1249,context.go:89,cluster.go:1237,test_runner.go:866: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-3427568-1631341029-39-n10cpu16 --oneshot --ignore-empty-nodes: exit status 1 10: dead (exit status 134)
        7: 886221
        5: 732100
        6: 1008139
        2: 899034
        4: 722529
        3: 1118969
        1: 1007587
        9: 928962
        8: 954688
        Error: UNCLASSIFIED_PROBLEM: 10: dead (exit status 134)
        (1) UNCLASSIFIED_PROBLEM
        Wraps: (2) attached stack trace
          -- stack trace:
          | main.glob..func14
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1173
          | main.wrap.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:281
          | github.com/spf13/cobra.(*Command).execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
          | github.com/spf13/cobra.(*Command).ExecuteC
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960
          | github.com/spf13/cobra.(*Command).Execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | main.main
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:2107
          | runtime.main
          |     /usr/local/go/src/runtime/proc.go:225
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1371
        Wraps: (3) 10: dead (exit status 134)
        Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 189259e803eca715307bfe0545c84189486a36c4:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

Same failure on other branches

- #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2 release-blocker]

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ d49fadb6f1c67d99ce91b719bac44b5640fa8e01:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

Same failure on other branches

- #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2 release-blocker]

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 1dbea6a3e4c93ba7c4a43ad34369c438558eba8a:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

Same failure on other branches

- #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ cc6296c24ddb048215dabe5cc41339f306db4f41:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

Same failure on other branches

- #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 0984f873c6170ab34afe6fee4661fc5f76ac0dee:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

Same failure on other branches

- #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ b3af96b0686773c78325d2b8b0623a8fcd3e9bf2:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Reproduce

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md)

Same failure on other branches

- #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 3 years ago

roachtest.clearrange/zfs/checks=true failed with artifacts on master @ 31ccb1ce864ebea41131fc94e28ff3c398c9a5fe:

        Wraps: (2) output in run_095554.692565404_n1_cockroach_workload_fixtures_import_bank
        Wraps: (3) /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod run teamcity-3625151-1634969608-39-n10cpu16:1 -- ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank returned
          | stderr:
          | I211023 09:55:55.784956 1 ccl/workloadccl/fixture.go:345  [-] 1  starting import of 1 tables
          | Error: importing fixture: importing table bank: dial tcp 127.0.0.1:26257: connect: connection refused
          | Error: COMMAND_PROBLEM: exit status 1
          | (1) COMMAND_PROBLEM
          | Wraps: (2) Node 1. Command with error:
          |   | ``````
          |   | ./cockroach workload fixtures import bank --payload-bytes=10240 --ranges=10 --rows=65104166 --seed=4 --db=bigbank
          |   | ``````
          | Wraps: (3) exit status 1
          | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
          |
          | stdout:
        Wraps: (4) exit status 20
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *exec.ExitError

    cluster.go:1300,context.go:91,cluster.go:1288,test_runner.go:866: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-3625151-1634969608-39-n10cpu16 --oneshot --ignore-empty-nodes: exit status 1 1: dead (exit status 137)
        8: 13747
        5: 13982
        3: 13668
        2: 14179
        9: 14063
        4: 13893
        6: 14074
        10: 13687
        7: 13611
        Error: UNCLASSIFIED_PROBLEM: 1: dead (exit status 137)
        (1) UNCLASSIFIED_PROBLEM
        Wraps: (2) attached stack trace
          -- stack trace:
          | main.glob..func14
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:1175
          | main.wrap.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:281
          | github.com/spf13/cobra.(*Command).execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
          | github.com/spf13/cobra.(*Command).ExecuteC
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960
          | github.com/spf13/cobra.(*Command).Execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | main.main
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:2104
          | runtime.main
          |     /usr/local/go/src/runtime/proc.go:225
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1371
        Wraps: (3) 1: dead (exit status 137)
        Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) | See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

/cc @cockroachdb/storage

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.clearrange/checks=true failed with artifacts on master @ a132b15e6c9705a6922c2e476936597c6670e072:

          | main.(*monitorImpl).WaitE
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:116
          | main.(*monitorImpl).Wait
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:124
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runClearRange
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:208
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerClearRange.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:38
          | main.(*testRunner).runTest.func2
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:778
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1581
        Wraps: (2) monitor failure
        Wraps: (3) unexpected node event: 4: dead (exit status 10)
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *errors.errorString

    cluster.go:1308,context.go:91,cluster.go:1296,test_runner.go:867: dead node detection: /home/agent/work/.go/src/github.com/cockroachdb/cockroach/bin/roachprod monitor teamcity-3788144-1637737747-32-n10cpu16 --oneshot --ignore-empty-nodes: exit status 1 9: 12691
        10: 12286
        5: 11999
        3: 11806
        8: 11907
        4: dead (exit status 10)
        7: 11935
        6: 11887
        1: 12137
        2: 12141
        Error: UNCLASSIFIED_PROBLEM: 4: dead (exit status 10)
        (1) UNCLASSIFIED_PROBLEM
        Wraps: (2) attached stack trace
          -- stack trace:
          | github.com/cockroachdb/cockroach/pkg/roachprod.Monitor
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/roachprod/roachprod.go:568
          | main.glob..func14
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:518
          | main.wrap.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:69
          | github.com/spf13/cobra.(*Command).execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:856
          | github.com/spf13/cobra.(*Command).ExecuteC
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:960
          | github.com/spf13/cobra.(*Command).Execute
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/vendor/github.com/spf13/cobra/command.go:897
          | main.main
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachprod/main.go:909
          | runtime.main
          |     /usr/local/go/src/runtime/proc.go:255
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1581
        Wraps: (3) 4: dead (exit status 10)
        Error types: (1) errors.Unclassified (2) *withstack.withStack (3) *errutil.leafError
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73013 roachtest: clearrange/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.1 release-blocker] - #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.clearrange/checks=true failed with artifacts on master @ 40f11fead0a0453969634f8ddb0502c1f78b2806:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73013 roachtest: clearrange/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.1 release-blocker] - #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.clearrange/checks=true failed with artifacts on master @ b450fea83a7db1e06403b2563c13f38c9284b932:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73013 roachtest: clearrange/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.1 release-blocker] - #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.clearrange/checks=true failed with artifacts on master @ 3b30a0e12f9a14b08ee8ad55b50299aca50c67a2:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73013 roachtest: clearrange/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.1 release-blocker] - #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.clearrange/checks=true failed with artifacts on master @ 2c014c47c1a242f504f6d595bfd79c0edc20b90a:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73013 roachtest: clearrange/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.1 release-blocker] - #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.clearrange/checks=true failed with artifacts on master @ 78419450178335b31f542bd1b14fefdf4ecee0e8:

          |    3: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    4: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    5: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    6: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    7: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    8: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    9: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |   10: 
          | UNCLASSIFIED_PROBLEM: context canceled
        Wraps: (4) secondary error attachment
          | COMMAND_PROBLEM: exit status 1
          | (1) COMMAND_PROBLEM
          | Wraps: (2) Node 2. Command with error:
          |   | ``````
          |   | ./cockroach workload run kv --concurrency=32 --duration=1h
          |   | ``````
          | Wraps: (3) exit status 1
          | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
        Wraps: (5) context canceled
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

    monitor.go:127,clearrange.go:207,clearrange.go:39,test_runner.go:780: monitor failure: monitor command failure: unexpected node event: 2: dead (exit status 10)
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runClearRange
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:207
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerClearRange.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:39
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func3
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:202
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor command failure
        Wraps: (5) unexpected node event: 2: dead (exit status 10)
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73013 roachtest: clearrange/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.1] - #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.clearrange/checks=true failed with artifacts on master @ 5ad21e3896ee809e9c3ebc28bb22166f1275acca:

          |    3: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    4: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    5: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    6: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    7: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    8: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    9: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |   10: 
          | UNCLASSIFIED_PROBLEM: context canceled
        Wraps: (4) secondary error attachment
          | COMMAND_PROBLEM: exit status 1
          | (1) COMMAND_PROBLEM
          | Wraps: (2) Node 2. Command with error:
          |   | ``````
          |   | ./cockroach workload run kv --concurrency=32 --duration=1h
          |   | ``````
          | Wraps: (3) exit status 1
          | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
        Wraps: (5) context canceled
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

    monitor.go:127,clearrange.go:207,clearrange.go:39,test_runner.go:780: monitor failure: monitor command failure: unexpected node event: 2: dead (exit status 10)
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runClearRange
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:207
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerClearRange.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:39
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func3
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:202
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor command failure
        Wraps: (5) unexpected node event: 2: dead (exit status 10)
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73013 roachtest: clearrange/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.1] - #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.clearrange/checks=true failed with artifacts on master @ 4b41789120e019ab015e6dbb924df763897ebadb:

          |    3: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    4: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    5: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    6: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    7: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    8: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |    9: 
          | UNCLASSIFIED_PROBLEM: context canceled
          |   10: 
          | UNCLASSIFIED_PROBLEM: context canceled
        Wraps: (4) secondary error attachment
          | COMMAND_PROBLEM: exit status 1
          | (1) COMMAND_PROBLEM
          | Wraps: (2) Node 2. Command with error:
          |   | ``````
          |   | ./cockroach workload run kv --concurrency=32 --duration=1h
          |   | ``````
          | Wraps: (3) exit status 1
          | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
        Wraps: (5) context canceled
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

    monitor.go:127,clearrange.go:207,clearrange.go:39,test_runner.go:780: monitor failure: monitor command failure: unexpected node event: 2: dead (exit status 137)
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runClearRange
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:207
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerClearRange.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:39
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func3
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:202
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor command failure
        Wraps: (5) unexpected node event: 2: dead (exit status 137)
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73013 roachtest: clearrange/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.1] - #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

This test on roachdash | Improve this report!

cockroach-teamcity commented 2 years ago

roachtest.clearrange/checks=true failed with artifacts on master @ 365b4da8bd02c06ee59d2130a56dec74ffc9ce21:

The test failed on branch=master, cloud=gce:
test timed out (see artifacts for details)
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73013 roachtest: clearrange/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.1] - #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

This test on roachdash | Improve this report!

nicktrav commented 2 years ago

This has failed the last four runs. Looking at the logs, it's not the ZFS tests that are failing, despite the ticket name.

From the latest failure:

        Wraps: (3) ./workload run tpcc --warehouses=1000 --duration=120m  {pgurl:1-3}  returned
          | stderr:
          | <... some data truncated by circular buffer; go to artifacts for details ...>
          | i=99.99999995 epo=25 ts=1642496980.413563058,1 min=1642496422.247633514,0 seq=23} lock=true stat=PENDING rts=1642496965.322893186,2 wto=false gul=1642496422.747633514,0 (SQLSTATE 40001) sql:RELEASE SAVEPOINT cockroach_restart]
          | I220118 09:09:44.502889 25903 workload/pgx_helpers.go:72  [-] 34677  pgx logger [error]: Exec logParams=map[args:[] err:ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE): "sql txn" meta={id=fcb16409 key=/Table/57/1/682/10/0 pri=99.99999995 epo=29 ts=1642496982.414824247,1 min=1642496383.144839075,0 seq=33} lock=true stat=PENDING rts=1642496968.139990199,2 wto=false gul=1642496383.644839075,0 (SQLSTATE 40001) sql:RELEASE SAVEPOINT cockroach_restart]
          | I220118 09:09:44.503988 25183 workload/pgx_helpers.go:72  [-] 34678  pgx logger [error]: Query logParams=map[args:[4020.15 962 7] err:ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_PUSHER_ABORTED): "sql txn" meta={id=311fffb9 key=/Table/56/1/962/0 pri=99.99999991 epo=0 ts=1642496983.047738087,0 min=1642496959.972611509,0 seq=1} lock=true stat=ABORTED rts=1642496959.972611509,0 wto=false gul=1642496960.472611509,0 (SQLSTATE 40001) sql:
          | I220118 09:09:44.503988 25183 workload/pgx_helpers.go:72  [-] 34678 +       UPDATE district
          | I220118 09:09:44.503988 25183 workload/pgx_helpers.go:72  [-] 34678 +       SET d_ytd = d_ytd + $1
          | I220118 09:09:44.503988 25183 workload/pgx_helpers.go:72  [-] 34678 +       WHERE d_w_id = $2 AND d_id = $3
          | I220118 09:09:44.503988 25183 workload/pgx_helpers.go:72  [-] 34678 +       RETURNING d_name, d_street_1, d_street_2, d_city, d_state, d_zip]
          | I220118 09:09:44.504141 24928 workload/pgx_helpers.go:72  [-] 34679  pgx logger [error]: Query logParams=map[args:[854.57 707 7] err:ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionAbortedError(ABORT_REASON_PUSHER_ABORTED): "sql txn" meta={id=421861eb key=/Table/56/1/707/0 pri=99.99999991 epo=0 ts=1642496982.815885173,0 min=1642496965.813377126,0 seq=1} lock=true stat=ABORTED rts=1642496965.813377126,0 wto=false gul=1642496966.313377126,0 (SQLSTATE 40001) sql:
          | I220118 09:09:44.504141 24928 workload/pgx_helpers.go:72  [-] 34679 +       UPDATE district
          | I220118 09:09:44.504141 24928 workload/pgx_helpers.go:72  [-] 34679 +       SET d_ytd = d_ytd + $1
          | I220118 09:09:44.504141 24928 workload/pgx_helpers.go:72  [-] 34679 +       WHERE d_w_id = $2 AND d_id = $3
          | I220118 09:09:44.504141 24928 workload/pgx_helpers.go:72  [-] 34679 +       RETURNING d_name, d_street_1, d_street_2, d_city, d_state, d_zip]
          | I220118 09:09:44.517952 21094 workload/pgx_helpers.go:72  [-] 34680  pgx logger [error]: Exec logParams=map[args:[] err:ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE - failed preemptive refresh due to a conflict: intent on key /Table/56/1/877/0): "sql txn" meta={id=2caaeaa6 key=/Table/57/1/877/1/0 pri=99.99999995 epo=4 ts=1642496972.970579320,2 min=1642496869.866784989,0 seq=26} lock=true stat=PENDING rts=1642496962.882809957,0 wto=false gul=1642496870.366784989,0 (SQLSTATE 40001) sql:RELEASE SAVEPOINT cockroach_restart]
          | I220118 09:09:44.526525 16702 workload/pgx_helpers.go:72  [-] 34681  pgx logger [error]: Exec logParams=map[args:[] err:ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE): "sql txn" meta={id=18378deb key=/Table/57/1/481/5/0 pri=99.99999995 epo=50 ts=1642496978.709635783,1 min=1642496181.627713324,0 seq=33} lock=true stat=PENDING rts=1642496973.037520891,0 wto=false gul=1642496182.127713324,0 (SQLSTATE 40001) sql:RELEASE SAVEPOINT cockroach_restart]
          | Error: error in newOrder: ERROR: restart transaction: TransactionRetryWithProtoRefreshError: TransactionRetryError: retry txn (RETRY_SERIALIZABLE): "sql txn" meta={id=18378deb key=/Table/57/1/481/5/0 pri=99.99999995 epo=50 ts=1642496978.709635783,1 min=1642496181.627713324,0 seq=33} lock=true stat=PENDING rts=1642496973.037520891,0 wto=false gul=1642496182.127713324,0 (SQLSTATE 40001)
          |
          | stdout:
          | <... some data truncated by circular buffer; go to artifacts for details ...>
          |      0           50.9           97.8   4563.4 103079.2 103079.2 103079.2 payment
          |  1169.0s        0            3.0            9.8  15032.4 103079.2 103079.2 103079.2 stockLevel
          |  1170.0s        0            1.0            9.8   7247.8   7247.8   7247.8   7247.8 delivery
          |  1170.0s        0           44.0           97.1   2684.4 103079.2 103079.2 103079.2 newOrder
          |  1170.0s        0            4.0            9.8    142.6 103079.2 103079.2 103079.2 orderStatus
          |  1170.0s        0           48.0           97.7   5637.1 103079.2 103079.2 103079.2 payment
          |  1170.0s        0            3.0            9.7   9126.8 103079.2 103079.2 103079.2 stockLevel
          |  1171.0s        0            3.0            9.8  12884.9 103079.2 103079.2 103079.2 delivery
          |  1171.0s        0           42.0           97.1   5100.3 103079.2 103079.2 103079.2 newOrder
          |  1171.0s        0            5.0            9.8    436.2   3489.7   3489.7   3489.7 orderStatus
          |  1171.0s        0           52.0           97.7   5637.1 103079.2 103079.2 103079.2 payment
          |  1171.0s        0            2.0            9.7    352.3   3355.4   3355.4   3355.4 stockLevel
          |  1172.0s        0            4.0            9.8  11811.2  68719.5  68719.5  68719.5 delivery
          |  1172.0s        0           41.0           97.0   4563.4 103079.2 103079.2 103079.2 newOrder
          |  1172.0s        0            4.0            9.8    335.5 103079.2 103079.2 103079.2 orderStatus
          |  1172.0s        0           43.0           97.6   2415.9 103079.2 103079.2 103079.2 payment
          |  1172.0s        0            0.0            9.7      0.0      0.0      0.0      0.0 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |  1173.0s        0            4.0            9.8  36507.2 103079.2 103079.2 103079.2 delivery
          |  1173.0s        0           54.1           97.0   5637.1 103079.2 103079.2 103079.2 newOrder
          |  1173.0s        0            5.0            9.8     35.7   3087.0   3087.0   3087.0 orderStatus
          |  1173.0s        0           44.1           97.6   3355.4 103079.2 103079.2 103079.2 payment
          |  1173.0s        0            4.0            9.7    469.8 103079.2 103079.2 103079.2 stockLevel
          |  1174.0s        0            4.0            9.8   1879.0   9126.8   9126.8   9126.8 delivery
          |  1174.0s        0           56.0           96.9   4563.4 103079.2 103079.2 103079.2 newOrder
          |  1174.0s        0            2.0            9.8    234.9  66572.0  66572.0  66572.0 orderStatus
          |  1174.0s        0           49.0           97.5   3355.4 103079.2 103079.2 103079.2 payment
          |  1174.0s        0            2.0            9.7    113.2  26843.5  26843.5  26843.5 stockLevel
          |  1175.0s        0            6.0            9.8  18253.6  73014.4  73014.4  73014.4 delivery
          |  1175.0s        0           53.0           96.9   3489.7 103079.2 103079.2 103079.2 newOrder
          |  1175.0s        0            5.0            9.8    134.2  26843.5  26843.5  26843.5 orderStatus
          |  1175.0s        0           40.0           97.5   3489.7 103079.2 103079.2 103079.2 payment
          |  1175.0s        0            5.0            9.7   7516.2 103079.2 103079.2 103079.2 stockLevel
          |  1176.0s        0            6.0            9.8  18253.6 103079.2 103079.2 103079.2 delivery
          |  1176.0s        0           53.0           96.9   5905.6 103079.2 103079.2 103079.2 newOrder
          |  1176.0s        0            3.0            9.8  94489.3 103079.2 103079.2 103079.2 orderStatus
          |  1176.0s        0           53.0           97.4   5368.7 103079.2 103079.2 103079.2 payment
          |  1176.0s        0            4.0            9.7     96.5 103079.2 103079.2 103079.2 stockLevel
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |  1177.0s        0            8.0            9.8  94489.3 103079.2 103079.2 103079.2 delivery
          |  1177.0s        0           50.0           96.8   6442.5 103079.2 103079.2 103079.2 newOrder
          |  1177.0s        0            3.0            9.8   5100.3  94489.3  94489.3  94489.3 orderStatus
          |  1177.0s        0           33.0           97.4   4160.7 103079.2 103079.2 103079.2 payment
          |  1177.0s        0            5.0            9.7    130.0 103079.2 103079.2 103079.2 stockLevel
nicktrav commented 2 years ago

The previous three runs failed with a mixture of:

monitor.go:127,clearrange.go:207,clearrange.go:39,test_runner.go:780: monitor failure: monitor command failure: unexpected node event: 7: dead (exit status 10)

and

Error: ERROR: result is ambiguous (error=rpc error: code = Unavailable desc = error reading from server: EOF [exhausted]) (SQLSTATE 40003)
nicktrav commented 2 years ago

Note to self: exit status 10 is disk full.

cockroach-teamcity commented 2 years ago

roachtest.clearrange/checks=true failed with artifacts on master @ 912964e02ddd951c77d4f71981ae18b3894e9084:

          | I220119 10:24:17.452567 343 workload/pgx_helpers.go:72  [-] 26  pgx logger [error]: Prepare failed logParams=map[err:unexpected EOF name:kv-1 sql:SELECT k, v FROM kv WHERE k IN ($1)]
          | I220119 10:24:17.437554 341 workload/pgx_helpers.go:72  [-] 27  pgx logger [error]: Exec logParams=map[args:[-1615074411872331196 ff] err:unexpected EOF sql:kv-2]
          | I220119 10:24:17.437571 342 workload/pgx_helpers.go:72  [-] 28  pgx logger [error]: Exec logParams=map[args:[137929059605095429 8b] err:unexpected EOF sql:kv-2]
          | I220119 10:24:17.437588 358 workload/pgx_helpers.go:72  [-] 29  pgx logger [error]: Exec logParams=map[args:[7541957276719205478 30] err:unexpected EOF sql:kv-2]
          | I220119 10:24:17.437607 367 workload/pgx_helpers.go:72  [-] 30  pgx logger [error]: Exec logParams=map[args:[1896697154141689576 25] err:unexpected EOF sql:kv-2]
          | I220119 10:24:17.437623 354 workload/pgx_helpers.go:72  [-] 31  pgx logger [error]: Exec logParams=map[args:[864750750582181352 39] err:unexpected EOF sql:kv-2]
          | I220119 10:24:17.437638 347 workload/pgx_helpers.go:72  [-] 32  pgx logger [error]: Exec logParams=map[args:[-6228323690426786486 f2] err:unexpected EOF sql:kv-2]
          | I220119 10:24:17.437652 345 workload/pgx_helpers.go:72  [-] 33  pgx logger [error]: Exec logParams=map[args:[331400927896966799 6e] err:unexpected EOF sql:kv-2]
          | I220119 10:24:17.437671 366 workload/pgx_helpers.go:72  [-] 34  pgx logger [error]: Exec logParams=map[args:[-5989122398152602669 f5] err:unexpected EOF sql:kv-2]
          | I220119 10:24:17.437686 348 workload/pgx_helpers.go:72  [-] 35  pgx logger [error]: Exec logParams=map[args:[-5284345074687460982 cf] err:unexpected EOF sql:kv-2]
          | I220119 10:24:17.434674 363 workload/pgx_helpers.go:72  [-] 17  pgx logger [error]: Exec logParams=map[args:[7035044378350194797 01] err:unexpected EOF sql:kv-2]
          | W220119 10:24:17.452621 343 workload/pgx_helpers.go:116  [-] 36  error preparing statement. name=kv-1 sql=SELECT k, v FROM kv WHERE k IN ($1) unexpected EOF
          | Error: unexpected EOF
          | COMMAND_PROBLEM: exit status 1
          |   10: 
          | UNCLASSIFIED_PROBLEM: context canceled
        Wraps: (4) secondary error attachment
          | COMMAND_PROBLEM: exit status 1
          | (1) COMMAND_PROBLEM
          | Wraps: (2) Node 9. Command with error:
          |   | ``````
          |   | ./cockroach workload run kv --concurrency=32 --duration=1h
          |   | ``````
          | Wraps: (3) exit status 1
          | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
        Wraps: (5) context canceled
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

    monitor.go:127,clearrange.go:207,clearrange.go:39,test_runner.go:780: monitor failure: monitor command failure: unexpected node event: 9: dead (exit status 137)
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runClearRange
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:207
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerClearRange.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:39
          | [...repeated from below...]
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func3
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:202
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1581
        Wraps: (4) monitor command failure
        Wraps: (5) unexpected node event: 9: dead (exit status 137)
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *errors.errorString
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73013 roachtest: clearrange/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.1] - #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

This test on roachdash | Improve this report!

nicktrav commented 2 years ago

I'm currently running a bisect on clearrange/checks=true between the last known good test run and the first failing test to see if I can find anything.

jbowens commented 2 years ago

I ran a clearrange/checks=false this morning, hoping to observe some benefit from d4f0a8d4 in #75150, but ran into this issue.

The top graph is available capacity per-node and the bottom graph is live-bytes per-node.

Screen Shot 2022-01-19 at 12 43 36 PM

I tried aggregating sstable-properties over the out-of-disk node's store and observed 8.1 K range deletions in L6:

ubuntu@jackson-1642610772-01-n10cpu16-0005:/mnt/data1$ ~/cockroach debug pebble db properties cockroach/
                  L0     L1     L2     L3        L4        L5        L6         TOTAL
count             0      0      0      5         275       2010      4569       6859
seq num
  smallest        0      0      0      233261    6951      4286      2830       2830
  largest         0      0      0      337619    338509    338373    338373     338509
size
  data            0 B    0 B    0 B    10 K      3.1 G     27 G      143 G      172 G
    blocks        0      0      0      5         107909    928061    4968442    6004417
  index           0 B    0 B    0 B    145 B     3.1 M     27 M      193 M      223 M
    blocks        0      0      0      5         275       2010      4571       6861
    top-level     0 B    0 B    0 B    0 B       0 B       0 B       59 B       59 B
  filter          0 B    0 B    0 B    69 B      413 K     3.5 M     2.4 M      6.2 M
  raw-key         0 B    0 B    0 B    131 B     7.9 M     68 M      367 M      444 M
  raw-value       0 B    0 B    0 B    10 K      3.1 G     27 G      142 G      172 G
records
  set             0      0      0      1         324 K     2.8 M     15 M       18 M
  delete          0      0      0      7         262       9.9 K     38 K       48 K
  range-delete    0      0      0      7         20        91        8.1 K      8.2 K
  merge           0      0      0      0         0         0         0          0

I wonder if something changed to hold open a snapshot for too long, preventing reclamation of disk space. cc @cockroachdb/kv.

nicktrav commented 2 years ago

Ive started a bisect between 7841945017 (first bad) and d6b99e92bf (last known good). ~17 commits in that range.

Current progress (will continue to update):

  1. 7841945017 bad - logs
  2. d6b99e92bf good - logs
  3. 9dc76f064a bad - logs
  4. afb8dbe096 bad - logs
  5. 20eaf0b415 good - logs
  6. cd1093d5f7 bad - logs
  7. 6664d0c34d - failed - logs
nicktrav commented 2 years ago

I have a feeling that 6664d0c34df0fea61de4fff1e97987b7de609b9e is the commit that is breaking things for us. Was able to bisect down cleanly to this commit.

cc: @tbg

tbg commented 2 years ago

Interesting. Will take a look. On the face of it this strikes me as unlikely, since that commit does not actually enable the circuit breakers, but let's see what I can find. Thanks for the bisect!

tbg commented 2 years ago

re-running 3x on 6664d0c - https://teamcity.cockroachdb.com/viewQueued.html?itemId=4162603 re-running 3x on 20eaf0b415f1df361246804e5d1d80c7a20a8eb6 - ~https://teamcity.cockroachdb.com/viewQueued.html?itemId=4162607~ https://teamcity.cockroachdb.com/viewLog.html?buildId=4163070& (this is the preceding merge, i.e. should be the last good commit)

cockroach-teamcity commented 2 years ago

roachtest.clearrange/checks=true failed with artifacts on master @ da01e4c0545f191a0573e1d097ff0366769e0d6b:

          |  1000.0s        0            0.0         3545.0      0.0      0.0      0.0      0.0 write
          | _elapsed___errors__ops/sec(inst)___ops/sec(cum)__p50(ms)__p95(ms)__p99(ms)_pMax(ms)
          |  1001.0s        0            0.0         3541.5      0.0      0.0      0.0      0.0 write
          | I220120 10:24:40.457186 362 workload/pgx_helpers.go:72  [-] 4  pgx logger [error]: Exec logParams=map[args:[4165652832793744639 1e] err:ERROR: result is ambiguous (error=rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout [propagate]) (SQLSTATE 40003) sql:kv-2]
          | I220120 10:24:40.457193 371 workload/pgx_helpers.go:72  [-] 3  pgx logger [error]: Exec logParams=map[args:[4254738307175174905 6b] err:ERROR: result is ambiguous (error=rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout [propagate]) (SQLSTATE 40003) sql:kv-2]
          | Error: ERROR: result is ambiguous (error=rpc error: code = Unavailable desc = keepalive ping failed to receive ACK within timeout [propagate]) (SQLSTATE 40003)
          | COMMAND_PROBLEM: exit status 1
        Wraps: (4) secondary error attachment
          | COMMAND_PROBLEM: exit status 1
          | (1) COMMAND_PROBLEM
          | Wraps: (2) Node 2. Command with error:
          |   | ``````
          |   | ./cockroach workload run kv --concurrency=32 --duration=1h
          |   | ``````
          | Wraps: (3) exit status 1
          | Error types: (1) errors.Cmd (2) *hintdetail.withDetail (3) *exec.ExitError
        Wraps: (5) context canceled
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *cluster.WithCommandDetails (4) *secondary.withSecondaryError (5) *errors.errorString

    monitor.go:127,clearrange.go:207,clearrange.go:39,test_runner.go:780: monitor failure: monitor task failed: t.Fatal() was called
        (1) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).WaitE
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:115
          | main.(*monitorImpl).Wait
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:123
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.runClearRange
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:207
          | github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerClearRange.func1
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/clearrange.go:39
          | main.(*testRunner).runTest.func2
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/test_runner.go:780
        Wraps: (2) monitor failure
        Wraps: (3) attached stack trace
          -- stack trace:
          | main.(*monitorImpl).wait.func2
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:171
        Wraps: (4) monitor task failed
        Wraps: (5) attached stack trace
          -- stack trace:
          | main.init
          |     /home/agent/work/.go/src/github.com/cockroachdb/cockroach/pkg/cmd/roachtest/monitor.go:80
          | runtime.doInit
          |     /usr/local/go/src/runtime/proc.go:6498
          | runtime.main
          |     /usr/local/go/src/runtime/proc.go:238
          | runtime.goexit
          |     /usr/local/go/src/runtime/asm_amd64.s:1581
        Wraps: (6) t.Fatal() was called
        Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *withstack.withStack (4) *errutil.withPrefix (5) *withstack.withStack (6) *errutil.leafError
Help

See: [roachtest README](https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/README.md) See: [How To Investigate \(internal\)](https://cockroachlabs.atlassian.net/l/c/SSSBr8c7)

Same failure on other branches

- #73013 roachtest: clearrange/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.1] - #70306 roachtest: clearrange/zfs/checks=true failed [C-test-failure O-roachtest O-robot T-storage branch-release-21.2]

This test on roachdash | Improve this report!

nicktrav commented 2 years ago

Fwiw - for good measure, I did another (more manual) bisect on the topological sort of the commit log (git log --topo-order --oneline --no-merges).

The results seem the same: 6664d0c34d (bad) and ad59351e4b (good).

nicktrav commented 2 years ago

Also going to link #73013 and #75140 in here. They are for clearrange on release-21.1 which seems weird, as the commit identified here is not present on that branch.

tbg commented 2 years ago

Poking a bit at run_1 in https://teamcity.cockroachdb.com/viewLog.html?buildId=4162603&buildTypeId=Cockroach_Nightlies_RoachtestStress&tab=artifacts&branch_Cockroach_Nightlies=%3Cdefault%3E#%2Fclearrange%2Fchecks%3Dtrue%2Frun_1 to get a feel for things.

I'm not seeing any "slow proposals", which is the first thing I looked for since I actually did make changes to that in 6664d0c. But I'm seeing slow latches on r10100, leaseholder n2. The range status looks innocuous, weird is maybe only this:

 "quiescent": true,
  "read_latches": 2,
  "top_k_locks_by_wait_queue_waiters": null

but note that this is from the debug.zip, i.e. after the workload had stopped. A quiesced range is not ticked, I moved something about slow proposals to the tick loop, so maybe something is interacting in weird ways? Just speculating here, it's late and I will look more tomorrow and also try to get a live repro.