cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

cli: TestCollectInfoFromMultipleStores failed #128958

Closed cockroach-teamcity closed 1 month ago

cockroach-teamcity commented 2 months ago

cli.TestCollectInfoFromMultipleStores failed on release-24.2 @ 191522a802ca853ed6776fc18d8c4d8f38599bbb:

Previous read at 0x00c0041b2db0 by goroutine 3356:
  github.com/cockroachdb/cockroach/pkg/storage/disk.(*Monitor).CumulativeStats()
      github.com/cockroachdb/cockroach/pkg/storage/disk/monitor.go:237 +0x84
  github.com/cockroachdb/cockroach/pkg/server.(*diskStatsMap).tryPopulateAdmissionDiskStats()
      github.com/cockroachdb/cockroach/pkg/server/node.go:1071 +0x2b9
  github.com/cockroachdb/cockroach/pkg/server.(*Node).GetPebbleMetrics()
      github.com/cockroachdb/cockroach/pkg/server/node.go:1153 +0xd1
  github.com/cockroachdb/cockroach/pkg/util/admission.(*StoreGrantCoordinators).SetPebbleMetricsProvider.func1()
      github.com/cockroachdb/cockroach/pkg/util/admission/grant_coordinator.go:118 +0x177

Goroutine 20 (running) created at:
  testing.(*T).Run()
      GOROOT/src/testing/testing.go:1742 +0x825
  testing.runTests.func1()
      GOROOT/src/testing/testing.go:2161 +0x85
  testing.tRunner()
      GOROOT/src/testing/testing.go:1689 +0x21e
  testing.runTests()
      GOROOT/src/testing/testing.go:2159 +0x8be
  testing.(*M).Run()
      GOROOT/src/testing/testing.go:2027 +0xf17
  github.com/cockroachdb/cockroach/pkg/cli.TestMain()
      github.com/cockroachdb/cockroach/pkg/cli/main_test.go:41 +0x1e5
  main.main()
      main/bazel-out/k8-fastbuild/bin/pkg/cli/cli_test_/testmain.go:384 +0x824
  runtime.main()
      GOROOT/src/runtime/proc.go:271 +0x29c
  github.com/cockroachdb/cockroach/pkg/util/parquet.box2DDecoder.decode()
      github.com/cockroachdb/cockroach/pkg/util/parquet/decoders.go:180 +0x3b
  github.com/cockroachdb/cockroach/pkg/util/parquet.init.0()
      github.com/cockroachdb/cockroach/pkg/util/parquet/decoders.go:407 +0x1d0

Goroutine 3356 (running) created at:
  github.com/cockroachdb/cockroach/pkg/util/admission.(*StoreGrantCoordinators).SetPebbleMetricsProvider()
      github.com/cockroachdb/cockroach/pkg/util/admission/grant_coordinator.go:107 +0x50c
  github.com/cockroachdb/cockroach/pkg/server.(*topLevelServer).PreStart()
      github.com/cockroachdb/cockroach/pkg/server/server.go:1983 +0x47f8
  github.com/cockroachdb/cockroach/pkg/server.(*testServer).PreStart()
      github.com/cockroachdb/cockroach/pkg/server/testserver.go:790 +0x104
  github.com/cockroachdb/cockroach/pkg/testutils/serverutils.(*wrap).PreStart()
      github.com/cockroachdb/cockroach/bazel-out/k8-fastbuild/bin/pkg/testutils/serverutils/ts_control_forwarder_generated.go:19 +0x72
  github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).Start()
      github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:415 +0x57e
  github.com/cockroachdb/cockroach/pkg/cli.TestCollectInfoFromMultipleStores()
      github.com/cockroachdb/cockroach/pkg/cli/debug_recover_loss_of_quorum_test.go:77 +0x6c4
  testing.tRunner()
      GOROOT/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      GOROOT/src/testing/testing.go:1742 +0x44
==================

Parameters:

See also: How To Investigate a Go Test Failure (internal)

/cc @cockroachdb/kv @cockroachdb/server

This test on roachdash | Improve this report!

Jira issue: CRDB-41303

kvoli commented 2 months ago

Failed on a data race in storage:

WARNING: DATA RACE
Write at 0x00c0041b2db0 by goroutine 20:
  github.com/cockroachdb/cockroach/pkg/storage/disk.(*Monitor).Close()
      github.com/cockroachdb/cockroach/pkg/storage/disk/monitor.go:292 +0x8e
  github.com/cockroachdb/cockroach/pkg/server.(*diskStatsMap).closeDiskMonitors()
      github.com/cockroachdb/cockroach/pkg/server/node.go:1126 +0x113
  github.com/cockroachdb/cockroach/pkg/server.(*topLevelServer).PreStart.func9()
      github.com/cockroachdb/cockroach/pkg/server/server.go:1973 +0x1a
  github.com/cockroachdb/cockroach/pkg/util/stop.CloserFn.Close()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:108 +0x26
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Stop()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:550 +0x259
  github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).stopServerLocked()
      github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:226 +0xe4
  github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).stopServers()
      github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:163 +0x42d
  github.com/cockroachdb/cockroach/pkg/testutils/testcluster.(*TestCluster).Start.func3()
      github.com/cockroachdb/cockroach/pkg/testutils/testcluster/testcluster.go:453 +0x3c
  github.com/cockroachdb/cockroach/pkg/util/stop.CloserFn.Close()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:108 +0x26
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).Stop()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:550 +0x259
  github.com/cockroachdb/cockroach/pkg/cli.TestCollectInfoFromMultipleStores()
      github.com/cockroachdb/cockroach/pkg/cli/debug_recover_loss_of_quorum_test.go:82 +0x7f2
  testing.tRunner()
      GOROOT/src/testing/testing.go:1689 +0x21e
  testing.(*T).Run.gowrap1()
      GOROOT/src/testing/testing.go:1742 +0x44
Previous read at 0x00c0041b2db0 by goroutine 3356:
  github.com/cockroachdb/cockroach/pkg/storage/disk.(*Monitor).CumulativeStats()
      github.com/cockroachdb/cockroach/pkg/storage/disk/monitor.go:237 +0x84
  github.com/cockroachdb/cockroach/pkg/server.(*diskStatsMap).tryPopulateAdmissionDiskStats()
      github.com/cockroachdb/cockroach/pkg/server/node.go:1071 +0x2b9
  github.com/cockroachdb/cockroach/pkg/server.(*Node).GetPebbleMetrics()
      github.com/cockroachdb/cockroach/pkg/server/node.go:1153 +0xd1
  github.com/cockroachdb/cockroach/pkg/util/admission.(*StoreGrantCoordinators).SetPebbleMetricsProvider.func1()
      github.com/cockroachdb/cockroach/pkg/util/admission/grant_coordinator.go:118 +0x177

Re-assigning to @cockroachdb/storage

jbowens commented 2 months ago

This is a race on shutdown, where the stopper may close the disk.Monitor before admission control is done reading it (via Node.GetPebbleMetrics). Removing the release-blocker label.

I'm going to look at refactoring this so that admission control owns its own disk.Monitor (which is a lightweight type that will result in a separate ref of the underlying monitoredDisk).