golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.72k stars 17.62k forks source link

database/sql: a potential circular wait between db.openerCh and db.mu.Lock() #37178

Closed lzhfromustc closed 4 years ago

lzhfromustc commented 4 years ago

What version of Go are you using (go version)?

I believe this issue exists since go1.10 to the latest master branch

$ go version
go version go1.12.9 darwin/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

I believe this issue is not relevant to OS or processor.

go env Output
$ go env
GOARCH="amd64"
GOBIN=""
GOCACHE="/Users/me/Library/Caches/go-build"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="darwin"
GOOS="darwin"
GOPATH="/Users/me/warehouse/myfork/go_proj_Gerrit"
GOPROXY=""
GORACE=""
GOROOT="/usr/local/go"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/darwin_amd64"
GCCGO="gccgo"
CC="clang"
CXX="clang++"
CGO_ENABLED="1"
GOMOD=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fno-caret-diagnostics -Qunused-arguments -fmessage-length=0 -fdebug-prefix-map=/var/folders/3j/2hnb78gj2clc98lqrhd1s7_r0000gn/T/go-build966675892=/tmp/go-build -gno-record-gcc-switches -fno-common"

What did you see?

In /src/database/sql/sql.go, db.openerCh is used in the following way:

// This is the size of the connectionOpener request chan (DB.openerCh).
// This value should be larger than the maximum typical value
// used for db.maxOpen. If maxOpen is significantly larger than
// connectionRequestQueueSize then it is possible for ALL calls into the *DB
// to block until the connectionOpener can satisfy the backlog of requests.
var connectionRequestQueueSize = 1000000
db.openerCh := make(chan struct{}, connectionRequestQueueSize)

// Goroutine 1
db.mu.Lock()
for ... {
    db.openerCh <- struct{}{} // Block
}()
db.mu.Lock()

// Goroutine 2, a.k.a. connectionOpener()
for {
    select {
    case <-ctx.Done():
        return
    case <- db.openerCh:
        db.mu.Lock() // Block
        ...
        db.mu.Unlock()
        ...
        continue
}

When the buffer is full, db.openerCh <- struct{}{} can be blocked. In the annotation above, it is believed that this send will be unblocked when "the connectionOpener can satisfy the backlog of requests". However, connectionOpener (a.k.a. Goroutine 2) can also be blocked trying to acquire db.mu.Lock(), which is held by Goroutine 1. So a circular wait presents.

I suppose this problem is not noticed because the buffer size is big and hard to be full. However, as the annotation says, send can still be blocking. I am not a security guy, but maybe DOS attack can force this happen? I found an issue mentioning related code, but it seems that they are not talking about circular wait: #10886

To fix this problem, I have written a patch, which can make these goroutines unblockable no matter what happens, but it is quite ugly... Do you think this is a problem? If so, shall I open a PR?

josharian commented 4 years ago

cc @kardianos

kardianos commented 4 years ago

I don't think this is a problem. A connection pool with a million connections, or even a million active goroutines seems unlikely. If someone actually hits this in practice, or is even able to hit this issue in a contrived test case with a real database, let me know.