globalsign / mgo

The MongoDB driver for Go
Other
1.97k stars 230 forks source link

Data Race When Server Has Gone #223

Closed pavelbrm closed 6 years ago

pavelbrm commented 6 years ago

Hi there!

Got race condition:

==================
WARNING: DATA RACE
Read at 0x00c4200bbe00 by goroutine 12:
  .../vendor/github.com/globalsign/mgo.(*Session).acquireSocket()
      .../vendor/github.com/globalsign/mgo/session.go:5110 +0x6e9
  .../vendor/github.com/globalsign/mgo.(*Database).Run()
      .../vendor/github.com/globalsign/mgo/session.go:992 +0x5d
  .../vendor/github.com/globalsign/mgo.(*Session).Run()
      .../vendor/github.com/globalsign/mgo/session.go:2489 +0xb9
  .../vendor/github.com/globalsign/mgo.(*Session).Ping()
      .../vendor/github.com/globalsign/mgo/session.go:2525 +0x5c

Previous write at 0x00c4200bbe00 by goroutine 63:
  .../vendor/github.com/globalsign/mgo.(*mongoSocket).kill()
      .../vendor/github.com/globalsign/mgo/socket.go:368 +0x253
  .../vendor/github.com/globalsign/mgo.(*mongoSocket).readLoop()
     .../vendor/github.com/globalsign/mgo/socket.go:602 +0x1166

Commit 113d396. Steps to reproduce:

  1. Connect using the following settings:
    
    ses, err := mgo.DialWithTimeout(url, timeout)
    if err != nil {
    return nil, errors.Wrapf(err, "mgo.DialWithTimeout: %s, %v", url, timeout)
    }

ses.SetMode(mgo.Monotonic, true)

2. Shutdown the server
3. Try to perform a call:
```golang
ses.Ping()
  1. Get a race.

This race is being reproduced in 100% cases.

Running on a system:

tadukurow commented 6 years ago

Thanks @exordium-frozen for reporting the issue.

We're currently a bit shorthanded because it is the vacation period. We should be able to have a closer look into this in a week or two.

tadukurow commented 6 years ago

I had a quick look and the race happens when acquiring a new socket, it is checked for liveness outside the socket lock. It looks like we should be able to get a fix out for this shortly.

tadukurow commented 6 years ago

After further investigation, this is a duplicate of #135. I'm closing this issue in favour of continuing the discussion on #135