cockroachdb / cockroach

CockroachDB — the cloud native, distributed SQL database designed for high availability, effortless scale, and control over data placement.
https://www.cockroachlabs.com
Other
30.07k stars 3.8k forks source link

kv/kvserver: TestTenantRateLimiter failed #101901

Closed cockroach-teamcity closed 1 year ago

cockroach-teamcity commented 1 year ago

kv/kvserver.TestTenantRateLimiter failed with artifacts on release-23.1 @ ad6ce866ea3b5c5bb47ba9a0ac19b721a0c98add:

      github.com/cockroachdb/cockroach/pkg/server/node_http_router.go:63 +0x163
  github.com/cockroachdb/cockroach/pkg/server.(*httpServer).baseHandler()
      github.com/cockroachdb/cockroach/pkg/server/server_http.go:312 +0x3c4
  github.com/cockroachdb/cockroach/pkg/server.(*httpServer).baseHandler-fm()
      <autogenerated>:1 +0x57
  github.com/cockroachdb/cockroach/pkg/server.(*serverController).httpMux()
      github.com/cockroachdb/cockroach/pkg/server/server_controller_http.go:106 +0x6a5
  github.com/cockroachdb/cockroach/pkg/server.(*serverController).httpMux-fm()
      <autogenerated>:1 +0x57
  net/http.HandlerFunc.ServeHTTP()
      GOROOT/src/net/http/server.go:2109 +0x4d
  net/http.serverHandler.ServeHTTP()
      GOROOT/src/net/http/server.go:2947 +0x641
  net/http.(*conn).serve()
      GOROOT/src/net/http/server.go:1991 +0xbe4
  net/http.(*Server).Serve.func3()
      GOROOT/src/net/http/server.go:3102 +0x58

Goroutine 1264535 (running) created at:
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:461 +0x619
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTask()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:332 +0x1ae
  github.com/cockroachdb/cockroach/pkg/server.(*Node).startComputePeriodicMetrics()
      github.com/cockroachdb/cockroach/pkg/server/node.go:808 +0x26
  github.com/cockroachdb/cockroach/pkg/server.(*Node).start()
      github.com/cockroachdb/cockroach/pkg/server/node.go:580 +0xd16
  github.com/cockroachdb/cockroach/pkg/server.(*Server).PreStart()
      github.com/cockroachdb/cockroach/pkg/server/server.go:1699 +0x33ba
  github.com/cockroachdb/cockroach/pkg/server.(*TestServer).Start()
      github.com/cockroachdb/cockroach/pkg/server/testserver.go:614 +0x8f
  github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartServer()
      github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:330 +0x1b6
  github.com/cockroachdb/cockroach/pkg/testutils/serverutils.StartServer()
      github.com/cockroachdb/cockroach/pkg/testutils/serverutils/test_server_shim.go:325 +0x96
  github.com/cockroachdb/cockroach/pkg/kv/kvserver_test.TestTenantRateLimiter()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver_test/pkg/kv/kvserver/client_tenant_test.go:168 +0x6a9
  testing.tRunner()
      GOROOT/src/testing/testing.go:1446 +0x216
  testing.(*T).Run.func1()
      GOROOT/src/testing/testing.go:1493 +0x47

Goroutine 1269778 (running) created at:
  net/http.(*Server).Serve()
      GOROOT/src/net/http/server.go:3102 +0x837
  github.com/cockroachdb/cockroach/pkg/server.startHTTPService.func4()
      github.com/cockroachdb/cockroach/pkg/server/server_http.go:280 +0x4f
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470 +0x1f6
==================

Parameters: TAGS=bazel,gss,race

Help

See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)

/cc @cockroachdb/kv

This test on roachdash | Improve this report!

Jira issue: CRDB-27180

erikgrinaker commented 1 year ago

Looks like a legit race. Conservatively marking as GA blocker.

WARNING: DATA RACE
Write at 0x00c00e278758 by goroutine 1264535:
  github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).Update()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:497 +0xe4
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).ComputeMetricsPeriodically()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store.go:3203 +0x3e7
  github.com/cockroachdb/cockroach/pkg/server.(*Node).computeMetricsPeriodically.func1()
      github.com/cockroachdb/cockroach/pkg/server/node.go:832 +0xfa
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores.func1()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:150 +0x5d
  github.com/cockroachdb/cockroach/pkg/util/syncutil.(*IntMap).Range()
      github.com/cockroachdb/cockroach/pkg/util/syncutil/int_map.go:352 +0x330
  github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores()
      github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:149 +0x75
  github.com/cockroachdb/cockroach/pkg/server.(*Node).computeMetricsPeriodically()
      github.com/cockroachdb/cockroach/pkg/server/node.go:831 +0xac
  github.com/cockroachdb/cockroach/pkg/server.(*Node).startComputePeriodicMetrics.func1()
      github.com/cockroachdb/cockroach/pkg/server/node.go:816 +0x23b
  github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
      github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470 +0x1f6

Previous read at 0x00c00e278758 by goroutine 1269778:
  github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).ToPrometheusMetric()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:577 +0x9a
  github.com/cockroachdb/cockroach/pkg/util/metric.(*PrometheusExporter).ScrapeRegistry.func1()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/prometheus_exporter.go:87 +0xa9
  github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each.func1()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/registry.go:167 +0x6d
  github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).Inspect()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:566 +0x3d
  github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/registry.go:166 +0x141
  github.com/cockroachdb/cockroach/pkg/util/metric.(*PrometheusExporter).ScrapeRegistry()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/prometheus_exporter.go:105 +0x16d
  github.com/cockroachdb/cockroach/pkg/server/status.(*MetricsRecorder).ScrapeIntoPrometheus()
      github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:317 +0x2a7
  github.com/cockroachdb/cockroach/pkg/server/status.(*MetricsRecorder).ScrapeIntoPrometheus-fm()
      <autogenerated>:1 +0x44
  github.com/cockroachdb/cockroach/pkg/util/metric.(*PrometheusExporter).ScrapeAndPrintAsText()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/prometheus_exporter.go:134 +0xb3
  github.com/cockroachdb/cockroach/pkg/server/status.(*MetricsRecorder).PrintAsText()
      github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:329 +0xb2
  github.com/cockroachdb/cockroach/pkg/server.varsHandler.handleVars()
      github.com/cockroachdb/cockroach/pkg/server/status.go:2024 +0x219
  github.com/cockroachdb/cockroach/pkg/server.varsHandler.handleVars-fm()
      <autogenerated>:1 +0x84
  net/http.HandlerFunc.ServeHTTP()
      GOROOT/src/net/http/server.go:2109 +0x4d
  net/http.(*ServeMux).ServeHTTP()
      GOROOT/src/net/http/server.go:2487 +0xc5
  net/http.(*ServeMux).ServeHTTP-fm()
      <autogenerated>:1 +0x57
  net/http.HandlerFunc.ServeHTTP()
      GOROOT/src/net/http/server.go:2109 +0x4d
  github.com/NYTimes/gziphandler.NewGzipLevelAndMinSize.func1.1()
      github.com/NYTimes/gziphandler/external/com_github_nytimes_gziphandler/gzip.go:253 +0x352
  net/http.HandlerFunc.ServeHTTP()
      GOROOT/src/net/http/server.go:2109 +0x4d
  net/http.Handler.ServeHTTP-fm()
      <autogenerated>:1 +0x75
  github.com/cockroachdb/cockroach/pkg/server.(*nodeProxy).nodeProxyHandler()
      github.com/cockroachdb/cockroach/pkg/server/node_http_router.go:63 +0x163
  github.com/cockroachdb/cockroach/pkg/server.(*httpServer).baseHandler()
      github.com/cockroachdb/cockroach/pkg/server/server_http.go:312 +0x3c4
  github.com/cockroachdb/cockroach/pkg/server.(*httpServer).baseHandler-fm()
      <autogenerated>:1 +0x57
  github.com/cockroachdb/cockroach/pkg/server.(*serverController).httpMux()
      github.com/cockroachdb/cockroach/pkg/server/server_controller_http.go:106 +0x6a5
  github.com/cockroachdb/cockroach/pkg/server.(*serverController).httpMux-fm()
      <autogenerated>:1 +0x57
  net/http.HandlerFunc.ServeHTTP()
      GOROOT/src/net/http/server.go:2109 +0x4d
  net/http.serverHandler.ServeHTTP()
      GOROOT/src/net/http/server.go:2947 +0x641
  net/http.(*conn).serve()
      GOROOT/src/net/http/server.go:1991 +0xbe4
  net/http.(*Server).Serve.func3()
      GOROOT/src/net/http/server.go:3102 +0x58
andrewbaptist commented 1 year ago

This was a result of the change to the histograms. @kvoli and @coolcom200 can you take a quick look at this. It seems like we just need a read lock in ToPrometheusMetric.

kvoli commented 1 year ago

This was a result of the change to the histograms. @kvoli and @coolcom200 can you take a quick look at this. It seems like we just need a read lock in ToPrometheusMetric.

You're right, we aren't locking when accessing the mwh.cum field in ToPrometheusMetric. This was added in 4a2d06a293412f62f5f05a3b0b799d30080592e7 but because we so rarely write to manual histograms, it was fairly unlikely we'd encounter a race.

I'll post a patch to resolve and backport.

andrewbaptist commented 1 year ago

It makes sense to also lock it in Inspect. Although I'm not sure if that is ever called, but if it was, it could hit the same race while iterating over the data.

kvoli commented 1 year ago

I don't think we are able to lock in Inspect without using a re-entrant lock - see the stack trace:

  github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).ToPrometheusMetric()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:577 +0x9a
  github.com/cockroachdb/cockroach/pkg/util/metric.(*PrometheusExporter).ScrapeRegistry.func1()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/prometheus_exporter.go:87 +0xa9
  github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each.func1()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/registry.go:167 +0x6d
  github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).Inspect()
      github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:566 +0x3d
kvoli commented 1 year ago

Resolved when backports merged.

kvoli commented 1 year ago

Closed on https://github.com/cockroachdb/cockroach/pull/102009