globalsign / mgo

The MongoDB driver for Go
Other
1.97k stars 230 forks source link

Data race for socket #135

Closed kirik closed 6 years ago

kirik commented 6 years ago

Hello! I got race condition randomly. Here is stack:

WARNING: DATA RACE
Read at 0x00c4201381b8 by goroutine 94:
  github.com/globalsign/mgo.(*Session).acquireSocket()
      /.../github.com/globalsign/mgo/session.go:4822 +0x768
  github.com/globalsign/mgo.(*Database).Run()
      /.../github.com/globalsign/mgo/session.go:799 +0x6d
  github.com/globalsign/mgo.(*Session).Run()
      /.../github.com/globalsign/mgo/session.go:2270 +0xc3
  github.com/globalsign/mgo.(*mongoCluster).isMaster()
      /.../github.com/globalsign/mgo/cluster.go:182 +0x27c
  github.com/globalsign/mgo.(*mongoCluster).syncServer()
      /.../github.com/globalsign/mgo/cluster.go:231 +0x457
  github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1.1()
      /.../github.com/globalsign/mgo/cluster.go:553 +0x200

Previous write at 0x00c4201381b8 by goroutine 33:
  github.com/globalsign/mgo.(*mongoSocket).kill()
      /.../github.com/globalsign/mgo/socket.go:351 +0x251
  github.com/globalsign/mgo.(*mongoSocket).resetNonce.func1()
      /.../github.com/globalsign/mgo/auth.go:123 +0x135
  github.com/globalsign/mgo.(*mongoSocket).readLoop()
      /.../github.com/globalsign/mgo/socket.go:628 +0x1101

Goroutine 94 (running) created at:
  github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1()
      /.../github.com/globalsign/mgo/cluster.go:525 +0x174
  github.com/globalsign/mgo.(*mongoCluster).syncServersIteration()
      /.../github.com/globalsign/mgo/cluster.go:580 +0x54b
  github.com/globalsign/mgo.(*mongoCluster).syncServersLoop()
      /.../github.com/globalsign/mgo/cluster.go:390 +0x18e

Goroutine 33 (running) created at:
  github.com/globalsign/mgo.newSocket()
      /.../github.com/globalsign/mgo/socket.go:197 +0x392
  github.com/globalsign/mgo.(*mongoServer).Connect()
      /.../github.com/globalsign/mgo/server.go:183 +0x60c
  github.com/globalsign/mgo.(*mongoServer).AcquireSocket()
      /.../github.com/globalsign/mgo/server.go:129 +0x349
  github.com/globalsign/mgo.(*mongoCluster).syncServer()
      /.../github.com/globalsign/mgo/cluster.go:225 +0x2a5
  github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1.1()
      /.../github.com/globalsign/mgo/cluster.go:553 +0x200
==================

I can't reproduce it. But I believe, that we need to Lock mutex for s.masterSocket on /.../globalsign/mgo/session.go:4822 before read dead property. Am I right?

Му OS is Ubuntu 16.04 and GO version is 1.9.2 linux/amd64.

Thank you in advance!

domodwyer commented 6 years ago

Hi @kirik

This is a bit weird - what commit are you using?

On the latest release (master / baa28fcb8e7d5dfab92026c0920cb6c9ae72faa2 / r2018.02.20) I see a read lock being taken on session.go:4813 (a few lines above the reported read) and an exclusive lock being taken on the write too (socket.go:344).

Dom

kirik commented 6 years ago

Hi @domodwyer I'm using the latest release. Yes, there is exclusive lock on mongoSocket on socket.go:344, but on session.go:4813 read lock takes on the Session, not on masterSocket which it should be (I believe). I'm Golang-newbie, don't blame me please if I wrong :)

domodwyer commented 6 years ago

Ah you're totally right. It's probably the case for the s.slaveSocket read too in the if above.

It should be possible to take read locks without causing a deadlock, but the locking is really, really tricky to follow so I can't say for sure without a test/proper review. I'm not entirely sure if it's a major problem either - the caller would get an error when trying to use the returned socket - it would be worth fixing and then profiling.

Thanks for the report!

Dom

KJTsanaktsidis commented 6 years ago

I might be able to have a look at this on Friday if nobody grabs it before then, since my brain has recently been in that code.

tadukurow commented 6 years ago

Seems like this issue is gaining some traction. The fix could be as easy as ensuring locks are taken on slaveSocket and masterSocket (adding func(s *socket) Dead() error perhaps) whilst checking if they are dead in session.acquireSocket().

But as said above the locking is tricky and needs some careful investigation. @KJTsanaktsidis you're still welcome to have a stab at this, if not maybe @eminano can help?

KJTsanaktsidis commented 6 years ago

I'll try and find some time for this over the weekend, but I don't want to stop someone else from taking a look at this too. Sorry, this kind of slipped off my radar :/

eminano commented 6 years ago

I'll have a look :)

eminano commented 6 years ago

@kirik , @exordium-frozen, @yavuzyurtbegendi, the fix for the data race has been merged into development, please let me know if the issue is still happening!

Thanks!

kirik commented 6 years ago

@eminano thank you! I'm not using mgo in production right now, but thank you for fix anyway!

yavuzyurtbegendi commented 6 years ago

@eminano thank you :)

eminano commented 6 years ago

I'll close this one now, feel free to reopen if the issue reoccurs!

DejeroDaniel commented 3 years ago

Hey, for the record still getting a data race using the most recent version.

==================
WARNING: DATA RACE
Read at 0x00c0001ab1a0 by goroutine 44:
  github.com/globalsign/mgo.(*Session).acquireSocket()
      /go/src/priv.repo/backend/vendor/github.com/globalsign/mgo/session.go:5110 +0x7eb
  github.com/globalsign/mgo.(*Collection).writeOp()
      /go/src/priv.repo/backend/vendor/github.com/globalsign/mgo/session.go:5312 +0x106
  github.com/globalsign/mgo.(*Collection).Upsert()
      /go/src/priv.repo/backend/vendor/github.com/globalsign/mgo/session.go:3108 +0x1f6
  priv.repo/backend/api/controller.(*StatusController).Show()
      /go/src/priv.repo/backend/api/controller/status.go:33 +0x5be
  priv.repo/backend/generated/app.MountStatusController.func1()
      /go/src/priv.repo/backend/generated/app/controllers.go:1479 +0x156
  priv.repo/backend/generated/app.handleStatusOrigin.func1()
      /go/src/priv.repo/backend/generated/app/controllers.go:1493 +0x8db
  github.com/goadesign/goa.(*Controller).MuxHandler.func1.1.1()
      /go/src/priv.repo/backend/vendor/github.com/goadesign/goa/service.go:309 +0x113
  github.com/goadesign/goa/middleware.ErrorHandler.func1.1()
      /go/src/priv.repo/backend/vendor/github.com/goadesign/goa/middleware/error_handler.go:21 +0xb0
  priv.repo/backend/lib/logger.LogRequest.func1.1()
      /go/src/priv.repo/backend/lib/logger/logger.go:148 +0xee4
  github.com/goadesign/goa/middleware.RequestIDWithHeaderAndLengthLimit.func1.1()
      /go/src/priv.repo/backend/vendor/github.com/goadesign/goa/middleware/request_id.go:63 +0x188
  github.com/goadesign/goa.(*Controller).MuxHandler.func1()
      /go/src/priv.repo/backend/vendor/github.com/goadesign/goa/service.go:343 +0x312
  github.com/goadesign/goa.(*mux).Handle.func1()
      /go/src/priv.repo/backend/vendor/github.com/goadesign/goa/mux.go:69 +0x267
  github.com/dimfeld/httptreemux.(*TreeMux).ServeLookupResult()
      /go/src/priv.repo/backend/vendor/github.com/dimfeld/httptreemux/router.go:247 +0x19a
  github.com/dimfeld/httptreemux.(*TreeMux).ServeHTTP()
      /go/src/priv.repo/backend/vendor/github.com/dimfeld/httptreemux/router.go:268 +0x186
  github.com/goadesign/goa.(*mux).ServeHTTP()
      /go/src/priv.repo/backend/vendor/github.com/goadesign/goa/mux.go:100 +0x69
  net/http.serverHandler.ServeHTTP()
      /usr/local/go/src/net/http/server.go:2843 +0xca
  net/http.(*conn).serve()
      /usr/local/go/src/net/http/server.go:1925 +0x84c

Previous write at 0x00c0001ab1a0 by goroutine 20:
  github.com/globalsign/mgo.(*mongoSocket).kill()
      /go/src/priv.repo/backend/vendor/github.com/globalsign/mgo/socket.go:368 +0x1f9
  github.com/globalsign/mgo.(*mongoSocket).readLoop()
      /go/src/priv.repo/backend/vendor/github.com/globalsign/mgo/socket.go:602 +0x1171

Goroutine 44 (running) created at:
  net/http.(*Server).Serve()
      /usr/local/go/src/net/http/server.go:2969 +0x5d3
  net/http.(*Server).ListenAndServe()
      /usr/local/go/src/net/http/server.go:2866 +0x105
  github.com/goadesign/goa.(*Service).ListenAndServe()
      /go/src/priv.repo/backend/vendor/github.com/goadesign/goa/service.go:204 +0x219
  main.main.func4()
      /go/src/priv.repo/backend/cmd/rv-backend/main.go:116 +0x198
  github.com/spf13/cobra.(*Command).execute()
      /go/src/priv.repo/backend/vendor/github.com/spf13/cobra/command.go:782 +0x6dd
  github.com/spf13/cobra.(*Command).ExecuteC()
      /go/src/priv.repo/backend/vendor/github.com/spf13/cobra/command.go:852 +0x42f
  github.com/spf13/cobra.(*Command).Execute()
      /go/src/priv.repo/backend/vendor/github.com/spf13/cobra/command.go:800 +0x112a
  main.main()
      /go/src/priv.repo/backend/cmd/rv-backend/main.go:148 +0x1135

Goroutine 20 (finished) created at:
  github.com/globalsign/mgo.newSocket()
      /go/src/priv.repo/backend/vendor/github.com/globalsign/mgo/socket.go:200 +0x377
  github.com/globalsign/mgo.(*mongoServer).Connect()
      /go/src/priv.repo/backend/vendor/github.com/globalsign/mgo/server.go:259 +0x55c
  github.com/globalsign/mgo.(*mongoServer).acquireSocketInternal()
      /go/src/priv.repo/backend/vendor/github.com/globalsign/mgo/server.go:207 +0x584
  github.com/globalsign/mgo.(*mongoServer).AcquireSocket()
      /go/src/priv.repo/backend/vendor/github.com/globalsign/mgo/server.go:125 +0x28a
  github.com/globalsign/mgo.(*mongoCluster).syncServer()
      /go/src/priv.repo/backend/vendor/github.com/globalsign/mgo/cluster.go:209 +0x26f
  github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1.1()
      /go/src/priv.repo/backend/vendor/github.com/globalsign/mgo/cluster.go:541 +0x228
==================

For context, was testing what happens when I kill the mongodb database container. Is reproducible with a simple upsert after the mongodb container dies.