cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.06k stars 3.8k forks source link

testutils/testcluster: TestRestart failed #118622

Open cockroach-teamcity opened 8 months ago

cockroach-teamcity commented 8 months ago

testutils/testcluster.TestRestart failed on master @ fce4d4723519bc4ca6e9ef5da0ae19960c84752c:

                              | github.com/cockroachdb/cockroach/pkg/server.(*serverController).getExpectedRunningTenants
                              |     github.com/cockroachdb/cockroach/pkg/server/server_controller_orchestration.go:206
                              | github.com/cockroachdb/cockroach/pkg/server.(*serverController).startInitialSecondaryTenantServers.func1
                              |     github.com/cockroachdb/cockroach/pkg/server/server_controller_orchestration.go:99
                              | github.com/cockroachdb/cockroach/pkg/util/startup.RunIdempotentWithRetryEx[...]
                              |     github.com/cockroachdb/cockroach/pkg/util/startup/retry.go:142
                              | github.com/cockroachdb/cockroach/pkg/server.(*serverController).startInitialSecondaryTenantServers
                              |     github.com/cockroachdb/cockroach/pkg/server/server_controller_orchestration.go:95
                              | github.com/cockroachdb/cockroach/pkg/server.(*serverController).start
                              |     github.com/cockroachdb/cockroach/pkg/server/server_controller_orchestration.go:43
                              | github.com/cockroachdb/cockroach/pkg/server.(*topLevelServer).PreStart
                              |     github.com/cockroachdb/cockroach/pkg/server/server.go:2187
                              | github.com/cockroachdb/cockroach/pkg/server.(*testServer).PreStart
                              |     github.com/cockroachdb/cockroach/pkg/server/testserver.go:807
                              | github.com/cockroachdb/cockroach/pkg/server.(*testServer).Start
                              |     github.com/cockroachdb/cockroach/pkg/server/testserver.go:883
                              | github.com/cockroachdb/cockroach/pkg/testutils/serverutils.(*wrap).Start
                              |     github.com/cockroachdb/cockroach/bazel-out/k8-fastbuild/bin/pkg/testutils/serverutils/ts_control_forwarder_generated.go:15
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).RestartServerWithInspect.func1
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1766
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).RestartServerWithInspect
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1776
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).RestartServer
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1690
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).Restart
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1680
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.TestRestart
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster_test.go:369
                              | [...repeated from below...]
                            Wraps: (5) list-tenants
                            Wraps: (6)
                            Wraps: (7) candidate pg code: 57014
                            Wraps: (8) attached stack trace
                              -- stack trace:
                              | github.com/cockroachdb/cockroach/pkg/util/cancelchecker.init
                              |     github.com/cockroachdb/cockroach/pkg/util/cancelchecker/cancel_checker.go:80
                              | runtime.doInit1
                              |     GOROOT/src/runtime/proc.go:6757
                              | runtime.doInit
                              |     GOROOT/src/runtime/proc.go:6724
                              | runtime.main
                              |     GOROOT/src/runtime/proc.go:249
                              | runtime.goexit
                              |     src/runtime/asm_amd64.s:1650
                            Wraps: (9) query execution canceled
                            Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *secondary.withSecondaryError (4) *withstack.withStack (5) *errutil.withPrefix (6) *colexecerror.notInternalError (7) *pgerror.withCandidateCode (8) *withstack.withStack (9) *errutil.leafError
            Test:           TestRestart
    panic.go:523: -- test log scope end --
test logs left over in: outputs.zip/logTestRestart2226094273
--- FAIL: TestRestart (18.25s)

Parameters:

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/test-eng

This test on roachdash | Improve this report!

Jira issue: CRDB-35810

renatolabs commented 8 months ago
failed to start the server controller: list-tenants: query execution canceled

@herkolategan any chance you have an idea of what could be going on here?

herkolategan commented 8 months ago
failed to start the server controller: list-tenants: query execution canceled

@herkolategan any chance you have an idea of what could be going on here?

Rings no specific bells, but I can take a look. Seems like some race condition, maybe?

renatolabs commented 8 months ago

I was actually able to reproduce this failure with the simple command below on current master (2e8d6ea0187):

% ./dev test ./pkg/testutils/testcluster/ -f TestRestart --stress --ignore-cache

I'm surprised we haven't see this fail more often in CI.

The timeout (context cancelation) happens while processing this query:

https://github.com/cockroachdb/cockroach/blob/6a84e2e951947929af8a59fe4132bbfcfd4c6570/pkg/server/server_controller_orchestration.go#L191-L199

I'm unsure if Test Eng can do anything concretely here, as we don't own or haven't written any of this code. I'll reassign to multitenant, but I don't think we have a staffed team to handle these issues either 🤷

cockroach-teamcity commented 4 months ago

testutils/testcluster.TestRestart failed on master @ 2ce93809d799bacc078f8a1d8e24a710dbff4d66:

                              | github.com/cockroachdb/cockroach/pkg/server.(*serverController).getExpectedRunningTenants
                              |     github.com/cockroachdb/cockroach/pkg/server/server_controller.go:347
                              | github.com/cockroachdb/cockroach/pkg/server.(*serverController).startInitialSecondaryTenantServers.func1
                              |     github.com/cockroachdb/cockroach/pkg/server/server_controller.go:240
                              | github.com/cockroachdb/cockroach/pkg/util/startup.RunIdempotentWithRetryEx[...]
                              |     github.com/cockroachdb/cockroach/pkg/util/startup/retry.go:142
                              | github.com/cockroachdb/cockroach/pkg/server.(*serverController).startInitialSecondaryTenantServers
                              |     github.com/cockroachdb/cockroach/pkg/server/server_controller.go:236
                              | github.com/cockroachdb/cockroach/pkg/server.(*serverController).start
                              |     github.com/cockroachdb/cockroach/pkg/server/server_controller.go:184
                              | github.com/cockroachdb/cockroach/pkg/server.(*topLevelServer).PreStart
                              |     github.com/cockroachdb/cockroach/pkg/server/server.go:2202
                              | github.com/cockroachdb/cockroach/pkg/server.(*testServer).PreStart
                              |     github.com/cockroachdb/cockroach/pkg/server/testserver.go:790
                              | github.com/cockroachdb/cockroach/pkg/server.(*testServer).Start
                              |     github.com/cockroachdb/cockroach/pkg/server/testserver.go:866
                              | github.com/cockroachdb/cockroach/pkg/testutils/serverutils.(*wrap).Start
                              |     github.com/cockroachdb/cockroach/bazel-out/k8-fastbuild/bin/pkg/testutils/serverutils/ts_control_forwarder_generated.go:15
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).RestartServerWithInspect.func1
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1758
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).RestartServerWithInspect
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1768
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).RestartServer
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1682
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).Restart
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:1672
                              | github.com/cockroachdb/cockroach/pkg/testutils/testcluster.TestRestart
                              |     github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster_test.go:370
                              | [...repeated from below...]
                            Wraps: (5) list-tenants
                            Wraps: (6)
                            Wraps: (7) candidate pg code: 57014
                            Wraps: (8) attached stack trace
                              -- stack trace:
                              | github.com/cockroachdb/cockroach/pkg/util/cancelchecker.init
                              |     github.com/cockroachdb/cockroach/pkg/util/cancelchecker/cancel_checker.go:80
                              | runtime.doInit1
                              |     GOROOT/src/runtime/proc.go:7206
                              | runtime.doInit
                              |     GOROOT/src/runtime/proc.go:7173
                              | runtime.main
                              |     GOROOT/src/runtime/proc.go:253
                              | runtime.goexit
                              |     src/runtime/asm_amd64.s:1695
                            Wraps: (9) query execution canceled
                            Error types: (1) *withstack.withStack (2) *errutil.withPrefix (3) *secondary.withSecondaryError (4) *withstack.withStack (5) *errutil.withPrefix (6) *colexecerror.notInternalError (7) *pgerror.withCandidateCode (8) *withstack.withStack (9) *errutil.leafError
            Test:           TestRestart
    panic.go:626: -- test log scope end --
test logs left over in: outputs.zip/logTestRestart1909538353
--- FAIL: TestRestart (18.06s)

Parameters:

See also: How To Investigate a Go Test Failure (internal)

This test on roachdash | Improve this report!

stevendanna commented 3 months ago

We should try to stress this to see if it is still failing. If not, close it out.