globalsign / mgo

The MongoDB driver for Go
Other
1.97k stars 230 forks source link

Data race on Dial #178

Closed seriousben closed 6 years ago

seriousben commented 6 years ago

Just saw a data race, on Dial.

The typical scenario where we have seen this happen is when multiple sessions are opened against the same mongo cluster/server. We know we should use session.Copy, but in tests we actually want a separate Dial.

Using revision: efe0945164a7e582241f37ae8983c075f8f2e870

WARNING: DATA RACE
Write at 0x00c42043b548 by goroutine 133:
  /vendor/github.com/globalsign/mgo.DialWithInfo()
      vendor/github.com/globalsign/mgo/session.go:602 +0x55e
  ...

Previous read at 0x00c42043b548 by goroutine 196:
  /vendor/github.com/globalsign/mgo.(*mongoCluster).server()
      vendor/github.com/globalsign/mgo/cluster.go:442 +0x83
  /vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1.1()
      vendor/github.com/globalsign/mgo/cluster.go:556 +0x1da

Goroutine 133 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:824 +0x564
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:777 +0x16d

Goroutine 196 (running) created at:
  /vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1()
      vendor/github.com/globalsign/mgo/cluster.go:529 +0x174
  /vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration()
      vendor/github.com/globalsign/mgo/cluster.go:584 +0x490
  /vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersLoop()
      vendor/github.com/globalsign/mgo/cluster.go:392 +0x17b
seriousben commented 6 years ago

Just got multiple ones running our tests.

==================
WARNING: DATA RACE
Write at 0x00c42032d428 by goroutine 71:
  .../vendor/github.com/globalsign/mgo.DialWithInfo()
      vendor/github.com/globalsign/mgo/session.go:602 +0x55e
  .../vendor/github.com/globalsign/mgo.DialWithTimeout()
      vendor/github.com/globalsign/mgo/session.go:315 +0xc2
  .../vendor/github.com/globalsign/mgo.Dial()
      vendor/github.com/globalsign/mgo/session.go:295 +0x55
  ...
  ...
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:777 +0x16d

Previous read at 0x00c42032d428 by goroutine 73:
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).server()
      vendor/github.com/globalsign/mgo/cluster.go:442 +0x83
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1.1()
      vendor/github.com/globalsign/mgo/cluster.go:556 +0x1da

Goroutine 71 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:824 +0x564
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:1063 +0xa4
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:777 +0x16d
  testing.runTests()
      /usr/local/go/src/testing/testing.go:1061 +0x4e1
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:978 +0x2cd
  ...
  main.main()
      _testmain.go:74 +0x22a

Goroutine 73 (running) created at:
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1()
      vendor/github.com/globalsign/mgo/cluster.go:529 +0x174
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration()
      vendor/github.com/globalsign/mgo/cluster.go:584 +0x490
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersLoop()
      vendor/github.com/globalsign/mgo/cluster.go:392 +0x17b
==================
==================
WARNING: DATA RACE
Write at 0x00c42032d430 by goroutine 71:
  .../vendor/github.com/globalsign/mgo.DialWithInfo()
      vendor/github.com/globalsign/mgo/session.go:603 +0x5a4
  .../vendor/github.com/globalsign/mgo.DialWithTimeout()
      vendor/github.com/globalsign/mgo/session.go:315 +0xc2
  .../vendor/github.com/globalsign/mgo.Dial()
      vendor/github.com/globalsign/mgo/session.go:295 +0x55
  ...
  ...
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:777 +0x16d

Previous read at 0x00c42032d430 by goroutine 73:
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).server()
      vendor/github.com/globalsign/mgo/cluster.go:443 +0xa4
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1.1()
      vendor/github.com/globalsign/mgo/cluster.go:556 +0x1da

Goroutine 71 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:824 +0x564
  testing.runTests.func1()
      /usr/local/go/src/testing/testing.go:1063 +0xa4
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:777 +0x16d
  testing.runTests()
      /usr/local/go/src/testing/testing.go:1061 +0x4e1
  testing.(*M).Run()
      /usr/local/go/src/testing/testing.go:978 +0x2cd
  ...
  main.main()
      _testmain.go:74 +0x22a

Goroutine 73 (running) created at:
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1()
      vendor/github.com/globalsign/mgo/cluster.go:529 +0x174
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration()
      vendor/github.com/globalsign/mgo/cluster.go:584 +0x490
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersLoop()
      vendor/github.com/globalsign/mgo/cluster.go:392 +0x17b
==================
--- FAIL: TestCollectionsFindByID (0.01s)
    testing.go:730: race detected during execution of test
==================
WARNING: DATA RACE
Write at 0x00c4204b0468 by goroutine 127:
  .../vendor/github.com/globalsign/mgo.DialWithInfo()
      vendor/github.com/globalsign/mgo/session.go:602 +0x55e
  ...
  ...
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:777 +0x16d

Previous read at 0x00c4204b0468 by goroutine 203:
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).server()
      vendor/github.com/globalsign/mgo/cluster.go:442 +0x83
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1.1()
      vendor/github.com/globalsign/mgo/cluster.go:556 +0x1da

Goroutine 127 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:824 +0x564
  ...
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:777 +0x16d

Goroutine 203 (running) created at:
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1()
      vendor/github.com/globalsign/mgo/cluster.go:529 +0x174
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration()
      vendor/github.com/globalsign/mgo/cluster.go:584 +0x490
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersLoop()
      vendor/github.com/globalsign/mgo/cluster.go:392 +0x17b
==================
==================
WARNING: DATA RACE
Write at 0x00c4204b0470 by goroutine 127:
  .../vendor/github.com/globalsign/mgo.DialWithInfo()
      vendor/github.com/globalsign/mgo/session.go:603 +0x5a4
  ...
  ...
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:777 +0x16d

Previous read at 0x00c4204b0470 by goroutine 203:
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).server()
      vendor/github.com/globalsign/mgo/cluster.go:443 +0xa4
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1.1()
      vendor/github.com/globalsign/mgo/cluster.go:556 +0x1da

Goroutine 127 (running) created at:
  testing.(*T).Run()
      /usr/local/go/src/testing/testing.go:824 +0x564
  ...
  testing.tRunner()
      /usr/local/go/src/testing/testing.go:777 +0x16d

Goroutine 203 (running) created at:
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration.func1()
      vendor/github.com/globalsign/mgo/cluster.go:529 +0x174
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersIteration()
      vendor/github.com/globalsign/mgo/cluster.go:584 +0x490
  .../vendor/github.com/globalsign/mgo.(*mongoCluster).syncServersLoop()
      vendor/github.com/globalsign/mgo/cluster.go:392 +0x17b
==================
seriousben commented 6 years ago

I suspect this is fixed in development. I suspect that https://github.com/globalsign/mgo/pull/161 changed the code so that syncServersIteration will never sync while DialWithInfo is changing cluster data.

seriousben commented 6 years ago

After a few retries we were not able to reproduce this when using the development branch.

For others, this particular error was happening very often in circleci.

domodwyer commented 6 years ago

Hi @seriousben

Sorry, I've been away for a few days but I just wanted to say thanks for taking the time to open an issue, thanks for checking if development fixes the issue, and thanks for confirming it does for others!

We'll cut a release in the next week or so.

Dom