cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

pkg/util/tracing/collector/collector_test: TestClusterInflightTraces failed #109469

Closed cockroach-teamcity closed 1 year ago

cockroach-teamcity commented 1 year ago

pkg/util/tracing/collector/collector_test.TestClusterInflightTraces failed with artifacts on master @ 7164cadbe45f479fb9e6f3296f287fa7492804f0:

      github.com/cockroachdb/cockroach/pkg/server/server_controller_orchestration.go:231 +0xc4
  github.com/cockroachdb/cockroach/pkg/server.(*channelOrchestrator).startControlledServer.func5.2()
      github.com/cockroachdb/cockroach/pkg/server/server_controller_channel_orchestrator.go:362 +0x12b
  github.com/cockroachdb/cockroach/pkg/server.(*channelOrchestrator).startControlledServer.func5()
      github.com/cockroachdb/cockroach/pkg/server/server_controller_channel_orchestrator.go:386 +0x56a
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 +0x1f6

Previous write at 0x00c000d75080 by goroutine 6249:
  runtime.mapassign()
      GOROOT/src/runtime/map.go:578 +0x0
  github.com/cockroachdb/cockroach/pkg/server.(*testServer).StartSharedProcessTenant()
      github.com/cockroachdb/cockroach/pkg/server/testserver.go:1088 +0xfb
  pkg/util/tracing/collector/collector_test_test.TestClusterInflightTraces.func1()
      pkg/util/tracing/collector/collector_test_test/pkg/util/tracing/collector/collector_test.go:259 +0x15c4
  github.com/cockroachdb/cockroach/pkg/testutils/testcluster.StartTestCluster()
      github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:236 +0x8b
  pkg/util/tracing/collector/collector_test_test.TestClusterInflightTraces.func1()
      pkg/util/tracing/collector/collector_test_test/pkg/util/tracing/collector/collector_test.go:227 +0xe8
  testing.tRunner()
      GOROOT/src/testing/testing.go:1446 +0x216
  testing.(*T).Run.func1()
      GOROOT/src/testing/testing.go:1493 +0x47

Goroutine 9484 (running) created at:
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:475 +0x619
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:346 +0xf4c
  github.com/cockroachdb/cockroach/pkg/server.(*channelOrchestrator).startControlledServer()
      github.com/cockroachdb/cockroach/pkg/server/server_controller_channel_orchestrator.go:295 +0x29
  github.com/cockroachdb/cockroach/pkg/server.(*serverController).createServerEntryLocked()
      github.com/cockroachdb/cockroach/pkg/server/server_controller_orchestration.go:151 +0x2b0
  github.com/cockroachdb/cockroach/pkg/server.(*serverController).scanTenantsForRunnableServices()
      github.com/cockroachdb/cockroach/pkg/server/server_controller_orchestration.go:112 +0x2f8
  github.com/cockroachdb/cockroach/pkg/server.(*serverController).start.func1()
      github.com/cockroachdb/cockroach/pkg/server/server_controller_orchestration.go:60 +0x21a
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:484 +0x1f6

Goroutine 6249 (running) created at:
  testing.(*T).Run()
      GOROOT/src/testing/testing.go:1493 +0x75d
  pkg/util/tracing/collector/collector_test_test.TestClusterInflightTraces()
      pkg/util/tracing/collector/collector_test_test/pkg/util/tracing/collector/collector_test.go:226 +0x4f8
  testing.tRunner()
      GOROOT/src/testing/testing.go:1446 +0x216
  testing.(*T).Run.func1()
      GOROOT/src/testing/testing.go:1493 +0x47
==================

Parameters: TAGS=bazel,gss,race , stress=true

Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)

/cc @cockroachdb/obs-inf-prs

This test on roachdash | Improve this report!

Jira issue: CRDB-30939

abarganier commented 1 year ago

This appears unrelated to the specific test itself. Instead, the race condition is hit within the serverController itself when dealing with the testArgs, which aren't protected by a mutex: https://github.com/cockroachdb/cockroach/blob/d9f6b1a99554ce2f9df111732b8de39947c7e988/pkg/server/server_controller.go#L75-L76

The TestServer writes to this map when starting shared process tenants, and the server controller orchestrator reads from it when instantiating a new server.

Seems like we can just move the testArgs map to be protected by the existing mutex used by the serverController to solve this.