Closed cockroach-teamcity closed 1 year ago
Looks like a legit race. Conservatively marking as GA blocker.
WARNING: DATA RACE
Write at 0x00c00e278758 by goroutine 1264535:
github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).Update()
github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:497 +0xe4
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Store).ComputeMetricsPeriodically()
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/store.go:3203 +0x3e7
github.com/cockroachdb/cockroach/pkg/server.(*Node).computeMetricsPeriodically.func1()
github.com/cockroachdb/cockroach/pkg/server/node.go:832 +0xfa
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores.func1()
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:150 +0x5d
github.com/cockroachdb/cockroach/pkg/util/syncutil.(*IntMap).Range()
github.com/cockroachdb/cockroach/pkg/util/syncutil/int_map.go:352 +0x330
github.com/cockroachdb/cockroach/pkg/kv/kvserver.(*Stores).VisitStores()
github.com/cockroachdb/cockroach/pkg/kv/kvserver/pkg/kv/kvserver/stores.go:149 +0x75
github.com/cockroachdb/cockroach/pkg/server.(*Node).computeMetricsPeriodically()
github.com/cockroachdb/cockroach/pkg/server/node.go:831 +0xac
github.com/cockroachdb/cockroach/pkg/server.(*Node).startComputePeriodicMetrics.func1()
github.com/cockroachdb/cockroach/pkg/server/node.go:816 +0x23b
github.com/cockroachdb/cockroach/pkg/util/stop.(*Stopper).RunAsyncTaskEx.func2()
github.com/cockroachdb/cockroach/pkg/util/stop/stopper.go:470 +0x1f6
Previous read at 0x00c00e278758 by goroutine 1269778:
github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).ToPrometheusMetric()
github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:577 +0x9a
github.com/cockroachdb/cockroach/pkg/util/metric.(*PrometheusExporter).ScrapeRegistry.func1()
github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/prometheus_exporter.go:87 +0xa9
github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each.func1()
github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/registry.go:167 +0x6d
github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).Inspect()
github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:566 +0x3d
github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each()
github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/registry.go:166 +0x141
github.com/cockroachdb/cockroach/pkg/util/metric.(*PrometheusExporter).ScrapeRegistry()
github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/prometheus_exporter.go:105 +0x16d
github.com/cockroachdb/cockroach/pkg/server/status.(*MetricsRecorder).ScrapeIntoPrometheus()
github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:317 +0x2a7
github.com/cockroachdb/cockroach/pkg/server/status.(*MetricsRecorder).ScrapeIntoPrometheus-fm()
<autogenerated>:1 +0x44
github.com/cockroachdb/cockroach/pkg/util/metric.(*PrometheusExporter).ScrapeAndPrintAsText()
github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/prometheus_exporter.go:134 +0xb3
github.com/cockroachdb/cockroach/pkg/server/status.(*MetricsRecorder).PrintAsText()
github.com/cockroachdb/cockroach/pkg/server/status/recorder.go:329 +0xb2
github.com/cockroachdb/cockroach/pkg/server.varsHandler.handleVars()
github.com/cockroachdb/cockroach/pkg/server/status.go:2024 +0x219
github.com/cockroachdb/cockroach/pkg/server.varsHandler.handleVars-fm()
<autogenerated>:1 +0x84
net/http.HandlerFunc.ServeHTTP()
GOROOT/src/net/http/server.go:2109 +0x4d
net/http.(*ServeMux).ServeHTTP()
GOROOT/src/net/http/server.go:2487 +0xc5
net/http.(*ServeMux).ServeHTTP-fm()
<autogenerated>:1 +0x57
net/http.HandlerFunc.ServeHTTP()
GOROOT/src/net/http/server.go:2109 +0x4d
github.com/NYTimes/gziphandler.NewGzipLevelAndMinSize.func1.1()
github.com/NYTimes/gziphandler/external/com_github_nytimes_gziphandler/gzip.go:253 +0x352
net/http.HandlerFunc.ServeHTTP()
GOROOT/src/net/http/server.go:2109 +0x4d
net/http.Handler.ServeHTTP-fm()
<autogenerated>:1 +0x75
github.com/cockroachdb/cockroach/pkg/server.(*nodeProxy).nodeProxyHandler()
github.com/cockroachdb/cockroach/pkg/server/node_http_router.go:63 +0x163
github.com/cockroachdb/cockroach/pkg/server.(*httpServer).baseHandler()
github.com/cockroachdb/cockroach/pkg/server/server_http.go:312 +0x3c4
github.com/cockroachdb/cockroach/pkg/server.(*httpServer).baseHandler-fm()
<autogenerated>:1 +0x57
github.com/cockroachdb/cockroach/pkg/server.(*serverController).httpMux()
github.com/cockroachdb/cockroach/pkg/server/server_controller_http.go:106 +0x6a5
github.com/cockroachdb/cockroach/pkg/server.(*serverController).httpMux-fm()
<autogenerated>:1 +0x57
net/http.HandlerFunc.ServeHTTP()
GOROOT/src/net/http/server.go:2109 +0x4d
net/http.serverHandler.ServeHTTP()
GOROOT/src/net/http/server.go:2947 +0x641
net/http.(*conn).serve()
GOROOT/src/net/http/server.go:1991 +0xbe4
net/http.(*Server).Serve.func3()
GOROOT/src/net/http/server.go:3102 +0x58
This was a result of the change to the histograms. @kvoli and @coolcom200 can you take a quick look at this. It seems like we just need a read lock in ToPrometheusMetric
.
This was a result of the change to the histograms. @kvoli and @coolcom200 can you take a quick look at this. It seems like we just need a read lock in ToPrometheusMetric.
You're right, we aren't locking when accessing the mwh.cum
field in ToPrometheusMetric. This was added in 4a2d06a293412f62f5f05a3b0b799d30080592e7 but because we so rarely write to manual histograms, it was fairly unlikely we'd encounter a race.
I'll post a patch to resolve and backport.
It makes sense to also lock it in Inspect
. Although I'm not sure if that is ever called, but if it was, it could hit the same race while iterating over the data.
I don't think we are able to lock in Inspect
without using a re-entrant lock - see the stack trace:
github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).ToPrometheusMetric()
github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:577 +0x9a
github.com/cockroachdb/cockroach/pkg/util/metric.(*PrometheusExporter).ScrapeRegistry.func1()
github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/prometheus_exporter.go:87 +0xa9
github.com/cockroachdb/cockroach/pkg/util/metric.(*Registry).Each.func1()
github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/registry.go:167 +0x6d
github.com/cockroachdb/cockroach/pkg/util/metric.(*ManualWindowHistogram).Inspect()
github.com/cockroachdb/cockroach/pkg/util/metric/pkg/util/metric/metric.go:566 +0x3d
Resolved when backports merged.
kv/kvserver.TestTenantRateLimiter failed with artifacts on release-23.1 @ ad6ce866ea3b5c5bb47ba9a0ac19b721a0c98add:
Parameters:
TAGS=bazel,gss,race
Help
See also: [How To Investigate a Go Test Failure \(internal\)](https://cockroachlabs.atlassian.net/l/c/HgfXfJgM)
/cc @cockroachdb/kv
This test on roachdash | Improve this report!
Jira issue: CRDB-27180