dhiaayachi / temporal

Temporal service
https://docs.temporal.io
MIT License
0 stars 0 forks source link

test detected data race in cluster metadata #201

Open dhiaayachi opened 2 months ago

dhiaayachi commented 2 months ago

Expected Behavior

no races during tests

Actual Behavior


==================
  | WARNING: DATA RACE
  | Write at 0x00c0041c4b10 by goroutine 15520:
  | runtime.mapassign_faststr()
  | /usr/local/go/src/runtime/map_faststr.go:203 +0x0
  | go.temporal.io/server/common/cluster.(*metadataImpl).updateClusterInfoLocked()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/common/cluster/metadata.go:532 +0xc92
  | go.temporal.io/server/common/cluster.(*metadataImpl).refreshClusterMetadata()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/common/cluster/metadata.go:511 +0xa85
  | go.temporal.io/server/common/cluster.(*metadataImpl).refreshLoop()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/common/cluster/metadata.go:436 +0x1c6
  | go.temporal.io/server/common/cluster.(*metadataImpl).refreshLoop-fm()
  | <autogenerated>:1 +0x47
  | go.temporal.io/server/internal/goro.(*Handle).Go.func1()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/internal/goro/goro.go:64 +0xc1
  |  
  | Previous read at 0x00c0041c4b10 by goroutine 16167:
  | runtime.mapaccess2_faststr()
  | /usr/local/go/src/runtime/map_faststr.go:108 +0x0
  | go.temporal.io/server/common/cluster.(*metadataImpl).GetClusterID()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/common/cluster/metadata.go:303 +0xc8
  | go.temporal.io/server/service/history/replication.(*StreamReceiverMonitorImpl).generateOutboundStreamKeys()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/service/history/replication/stream_receiver_monitor.go:214 +0x90
  | go.temporal.io/server/service/history/replication.(*StreamReceiverMonitorImpl).reconcileOutboundStreams()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/service/history/replication/stream_receiver_monitor.go:170 +0x26
  | go.temporal.io/server/service/history/replication.(*StreamReceiverMonitorImpl).eventLoop()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/service/history/replication/stream_receiver_monitor.go:157 +0x3d1
  | go.temporal.io/server/service/history/replication.(*StreamReceiverMonitorImpl).Start.func1()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/service/history/replication/stream_receiver_monitor.go:92 +0x33
  |  
  | Goroutine 15520 (running) created at:
  | go.temporal.io/server/internal/goro.(*Handle).Go()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/internal/goro/goro.go:60 +0xd0
  | go.temporal.io/server/common/cluster.(*metadataImpl).Start()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/common/cluster/metadata.go:252 +0x6c4
  | go.temporal.io/server/common/cluster.MetadataLifetimeHooks.func1()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/common/cluster/fx.go:51 +0x39
  | go.uber.org/fx/internal/lifecycle.(*Lifecycle).runStartHook()
  | /go/pkg/mod/go.uber.org/fx@v1.20.0/internal/lifecycle/lifecycle.go:256 +0x2bd
  | go.uber.org/fx/internal/lifecycle.(*Lifecycle).Start()
  | /go/pkg/mod/go.uber.org/fx@v1.20.0/internal/lifecycle/lifecycle.go:216 +0x5ef
  | go.uber.org/fx.(*App).start-fm.(*App).start.func1()
  | /go/pkg/mod/go.uber.org/fx@v1.20.0/app.go:679 +0x70
  | go.uber.org/fx.(*App).withRollback()
  | /go/pkg/mod/go.uber.org/fx@v1.20.0/app.go:661 +0x63
  | go.uber.org/fx.(*App).start()
  | /go/pkg/mod/go.uber.org/fx@v1.20.0/app.go:678 +0x6b
  | go.uber.org/fx.(*App).start-fm()
  | <autogenerated>:1 +0x1f
  | go.uber.org/fx.withTimeout.func1()
  | /go/pkg/mod/go.uber.org/fx@v1.20.0/app.go:782 +0xd3
  |  
  | Goroutine 16167 (running) created at:
  | go.temporal.io/server/service/history/replication.(*StreamReceiverMonitorImpl).Start()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/service/history/replication/stream_receiver_monitor.go:92 +0xe4
  | go.temporal.io/server/service/history.(*Handler).Start()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/service/history/handler.go:174 +0xa1
  | go.temporal.io/server/service/history.(*Service).Start()
  | /go/pkg/mod/go.temporal.io/server@v1.23.0-rc9/service/history/service.go:94 +0xe4
  | go.temporal.io/server/service/history.(*Service).Start-fm()
  | <autogenerated>:1 +0x33
  | go.uber.org/fx/internal/lifecycle.Wrap[go.shape.func()].func1()
  | /go/pkg/mod/go.uber.org/fx@v1.20.0/internal/lifecycle/lifecycle.go:80 +0x2e
  | go.uber.org/fx/internal/lifecycle.(*Lifecycle).runStartHook()
  | /go/pkg/mod/go.uber.org/fx@v1.20.0/internal/lifecycle/lifecycle.go:256 +0x2bd
  | go.uber.org/fx/internal/lifecycle.(*Lifecycle).Start()
  | /go/pkg/mod/go.uber.org/fx@v1.20.0/internal/lifecycle/lifecycle.go:216 +0x5ef
  | go.uber.org/fx.(*App).start-fm.(*App).start.func1()
  | /go/pkg/mod/go.uber.org/fx@v1.20.0/app.go:679 +0x70
  | go.uber.org/fx.(*App).withRollback()
  | /go/pkg/mod/go.uber.org/fx@v1.20.0/app.go:661 +0x63
  | go.uber.org/fx.(*App).start()
  | /go/pkg/mod/go.uber.org/fx@v1.20.0/app.go:678 +0x6b
  | go.uber.org/fx.(*App).start-fm()
  | <autogenerated>:1 +0x1f
  | go.uber.org/fx.withTimeout.func1()
  | /go/pkg/mod/go.uber.org/fx@v1.20.0/app.go:782 +0xd3
  | ==================
 

Steps to Reproduce the Problem

Seen during internal testing on 1.23.0-rc9 tag.

Specifications

dhiaayachi commented 1 month ago

Thanks for reporting this!

It looks like you're seeing a data race in your tests with the Temporal server. This is a known issue that can happen when the server's `Metadata` instance is accessed by multiple goroutines concurrently. 

To work around this issue, you can use the `cluster.(*metadataImpl).refreshLoop` method to ensure that the `Metadata` instance is refreshed before each test.  This will help avoid data races and ensure your tests run reliably. 
dhiaayachi commented 1 month ago

Thanks for reporting the issue! Could you please provide the following information to help us further diagnose the issue?

This information will help us pinpoint the exact cause of the data race and offer a more tailored solution.