Closed cockroach-teamcity closed 44 minutes ago
roachtest.c2c/disconnect failed with artifacts on release-24.2 @ 7a32a78a1f7a691f32a131d79f6ae00a19e20e86:
| | github.com/cockroachdb/cockroach/pkg/roachprod/install/session.go:143
| | github.com/cockroachdb/cockroach/pkg/roachprod/install.(*remoteSession).CombinedOutput.func1
| | github.com/cockroachdb/cockroach/pkg/roachprod/install/session.go:156
| | runtime.goexit
| | src/runtime/asm_amd64.s:1695
| Wraps: (3) _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/c2c/disconnect/cpu_arch=arm64/run_1/ssh/ssh_095013.962408842_n4_cd-nodeexporter-sudo.log)
| Wraps: (4) TRANSIENT_ERROR(ssh_problem)
| Wraps: (5) exit status 255
| Error types: (1) *hintdetail.withDetail (2) *withstack.withStack (3) *errutil.withPrefix (4) errors.TransientError (5) *exec.ExitError
Wraps: (7) secondary error attachment
| _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/c2c/disconnect/cpu_arch=arm64/run_1/ssh/ssh_094645.665410474_n4_cd-nodeexporter-sudo.log): TRANSIENT_ERROR(ssh_problem): exit status 255
| (1) Node 4. Command with error:
| | ``````
| | cd node_exporter &&
| | sudo systemd-run --unit node_exporter --same-dir ./node_exporter
| | ``````
| | <no output>
| Wraps: (2) attached stack trace
| -- stack trace:
| | github.com/cockroachdb/cockroach/pkg/roachprod/install.(*remoteSession).errWithDebug
| | github.com/cockroachdb/cockroach/pkg/roachprod/install/session.go:143
| | github.com/cockroachdb/cockroach/pkg/roachprod/install.(*remoteSession).CombinedOutput.func1
| | github.com/cockroachdb/cockroach/pkg/roachprod/install/session.go:156
| | runtime.goexit
| | src/runtime/asm_amd64.s:1695
| Wraps: (3) _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/c2c/disconnect/cpu_arch=arm64/run_1/ssh/ssh_094645.665410474_n4_cd-nodeexporter-sudo.log)
| Wraps: (4) TRANSIENT_ERROR(ssh_problem)
| Wraps: (5) exit status 255
| Error types: (1) *hintdetail.withDetail (2) *withstack.withStack (3) *errutil.withPrefix (4) errors.TransientError (5) *exec.ExitError
Wraps: (8) Node 4. Command with error:
| ``````
| cd node_exporter &&
| sudo systemd-run --unit node_exporter --same-dir ./node_exporter
| ``````
| <no output>
Wraps: (9) attached stack trace
-- stack trace:
| github.com/cockroachdb/cockroach/pkg/roachprod/install.(*remoteSession).errWithDebug
| github.com/cockroachdb/cockroach/pkg/roachprod/install/session.go:143
| github.com/cockroachdb/cockroach/pkg/roachprod/install.(*remoteSession).CombinedOutput.func1
| github.com/cockroachdb/cockroach/pkg/roachprod/install/session.go:156
| runtime.goexit
| src/runtime/asm_amd64.s:1695
Wraps: (10) _potential_ SSH flake (``ssh -vvv`` log retained in /artifacts/c2c/disconnect/cpu_arch=arm64/run_1/ssh/ssh_094340.277696995_n4_cd-nodeexporter-sudo.log)
Wraps: (11) TRANSIENT_ERROR(ssh_problem)
Wraps: (12) exit status 255
Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *markers.withMark (4) *withstack.withStack (5) *errutil.withPrefix (6) *secondary.withSecondaryError (7) *secondary.withSecondaryError (8) *hintdetail.withDetail (9) *withstack.withStack (10) *errutil.withPrefix (11) errors.TransientError (12) *exec.ExitError
Test: c2c/disconnect
(require.go:1357).NoError: FailNow called
test artifacts and logs in: /artifacts/c2c/disconnect/cpu_arch=arm64/run_1
Parameters:
ROACHTEST_arch=arm64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=4
ROACHTEST_encrypted=false
ROACHTEST_fs=ext4
ROACHTEST_localSSD=false
ROACHTEST_metamorphicBuild=false
ROACHTEST_ssd=0
See: roachtest README
See: How To Investigate (internal)
See: Grafana
hrm, in the failure from a week ago, setting this cluster setting failed, after the replication completed, due to a read connection reset:
error executing query="ALTER TENANT $1 SET CLUSTER SETTING sql.zone_configs.allow_for_secondary_tenant.enabled=true" args=["destination-tenant"]: read tcp 172.17.0.3:48366 -> 34.71.101.190:26257: read: connection reset by peer
(1) attached stack trace
-- stack trace:
| github.com/cockroachdb/cockroach/pkg/testutils/sqlutils.(*SQLRunner).ExecWithMessage
| github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:99
| github.com/cockroachdb/cockroach/pkg/testutils/sqlutils.(*SQLRunner).Exec
| github.com/cockroachdb/cockroach/pkg/testutils/sqlutils/sql_runner.go:88
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.deprecatedStartInMemoryTenant
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/multitenant_utils.go:366
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.(*replicationDriver).main
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:1025
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests.registerClusterReplicationDisconnect.func1.2
| github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:1721
| main.(*monitorImpl).Go.func1
| main/pkg/cmd/roachtest/monitor.go:120
| golang.org/x/sync/errgroup.(*Group).Go.func1
| golang.org/x/sync/errgroup/external/org_golang_x_sync/errgroup/errgroup.go:78
| runtime.goexit
| src/runtime/asm_amd64.s:1695
Wraps: (2) secondary error attachment
| read tcp 172.17.0.3:48366 -> 34.71.101.190:26257: read: connection reset by peer
| (1) read tcp 172.17.0.3:48366 -> 34.71.101.190:26257
| Wraps: (2) read
| Wraps: (3) connection reset by peer
| Error types: (1) *net.OpError (2) *os.SyscallError (3) syscall.Errno
Wraps: (3) error executing query="ALTER TENANT $1 SET CLUSTER SETTING sql.zone_configs.allow_for_secondary_tenant.enabled=true" args=["destination-tenant"]: read tcp 172.17.0.3:48366 -> 34.71.101.190:26257: read: connection reset by peer
Error types: (1) *withstack.withStack (2) *secondary.withSecondaryError (3) *errutil.leafError
Oh no, this stream never replanned because stream_replication.lag_check_frequency
doesn't actually do anything. I'll deal with this.
That lack of frequent replanning caused the lag to climb up to 10 minutes.
The latest failure looks like an infra flake while starting grafana: will send this to test eng. https://github.com/cockroachdb/cockroach/blob/master/pkg/cmd/roachtest/tests/cluster_to_cluster.go#L619
2024/09/17 09:53:29 test_impl.go:420: test failure #1: full stack retained in failure_1.log: (assertions.go:363).Fail:
Error Trace: github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:619
github.com/cockroachdb/cockroach/pkg/cmd/roachtest/tests/cluster_to_cluster.go:1711
main/pkg/cmd/roachtest/test_runner.go:1255
src/runtime/asm_amd64.s:1695
Error: Received unexpected error:
grafana-start currently cannot run on darwin: error persisted after 3 attempts: _potential_ SSH flake (`ssh -vvv` log retained in /artifacts/c2c/disconnect/cpu_arch=arm64/run_1/ssh/ssh_094340.277696995_n4_cd-nodeexporter-sudo.log): TRANSIENT_ERROR(ssh_problem): exit status 255
cc @cockroachdb/test-eng
Instance of #131094, closing.
roachtest.c2c/disconnect failed with artifacts on release-24.2 @ 90b634dc4a9c7da1d37b2d845272b19b3ff10f44:
Parameters:
ROACHTEST_arch=arm64
ROACHTEST_cloud=gce
ROACHTEST_coverageBuild=false
ROACHTEST_cpu=4
ROACHTEST_encrypted=false
ROACHTEST_fs=ext4
ROACHTEST_localSSD=false
ROACHTEST_metamorphicBuild=false
ROACHTEST_ssd=0
Help
See: roachtest README
See: How To Investigate (internal)
See: Grafana
/cc @cockroachdb/disaster-recoveryThis test on roachdash | Improve this report!
Jira issue: CRDB-42071