data race from unit test

kardianos commented 3 years ago

https://ci.appveyor.com/project/denisenkom/go-mssqldb/builds/39994729/job/nq0golv5qehgjeaq

go test -race -cpu 4 -coverprofile=coverage.txt -covermode=atomic
2021/07/14 19:29:23 got token tokenLoginAck
2021/07/14 19:29:23 got token tokenEnvChange
2021/07/14 19:29:23 got token tokenDone
2021/07/14 19:29:23 got DONE or DONEPROC status=0
2021/07/14 19:29:23 
SET XACT_ABORT ON; -- 16384
SET ANSI_NULLS ON; -- 32
SET ARITHIGNORE ON; -- 128
2021/07/14 19:29:23 got token tokenEnvChange
2021/07/14 19:29:23 got token tokenDone
2021/07/14 19:29:23 got DONE or DONEPROC status=1
2021/07/14 19:29:23 got token tokenDone
2021/07/14 19:29:23 got DONE or DONEPROC status=1
2021/07/14 19:29:23 got token tokenDone
2021/07/14 19:29:23 got DONE or DONEPROC status=0
2021/07/14 19:29:23 
select Options = @@OPTIONS;
2021/07/14 19:29:23 got token tokenColMetadata
2021/07/14 19:29:23 got token tokenRow
2021/07/14 19:29:23 got token tokenDone
2021/07/14 19:29:23 got DONE or DONEPROC status=16
2021/07/14 19:29:23 (1 row(s) affected)
2021/07/14 19:29:23 
SET XACT_ABORT ON; -- 16384
SET ANSI_NULLS ON; -- 32
SET ARITHIGNORE ON; -- 128
2021/07/14 19:29:23 got token tokenEnvChange
2021/07/14 19:29:23 got token tokenDone
2021/07/14 19:29:23 got DONE or DONEPROC status=1
2021/07/14 19:29:23 got token tokenDone
2021/07/14 19:29:23 got DONE or DONEPROC status=1
2021/07/14 19:29:23 got token tokenDone
2021/07/14 19:29:23 got DONE or DONEPROC status=0
==================
WARNING: DATA RACE
Read at 0x00c0000dc143 by goroutine 56:
  testing.(*common).logDepth()
      c:/go113/src/testing/testing.go:665 +0xa8
  testing.(*common).Logf()
      c:/go113/src/testing/testing.go:658 +0x96
  testing.(*T).Logf()
      <autogenerated>:1 +0x7c
  github.com/denisenkom/go-mssqldb.(*testLogger).Printf()
      C:/gopath/src/github.com/denisenkom/go-mssqldb/tds_test.go:253 +0xa2
  github.com/denisenkom/go-mssqldb.optionalLogger.Printf()
      C:/gopath/src/github.com/denisenkom/go-mssqldb/log.go:18 +0xb5
  github.com/denisenkom/go-mssqldb.processSingleResponse()
      C:/gopath/src/github.com/denisenkom/go-mssqldb/token.go:655 +0x2d5
Previous write at 0x00c0000dc143 by goroutine 45:
  testing.tRunner.func1()
      c:/go113/src/testing/testing.go:900 +0x35a
  testing.tRunner()
      c:/go113/src/testing/testing.go:913 +0x1c2
Goroutine 56 (running) created at:
  github.com/denisenkom/go-mssqldb.startReading()
      C:/gopath/src/github.com/denisenkom/go-mssqldb/token.go:772 +0xab
  github.com/denisenkom/go-mssqldb.(*Stmt).processExec()
      C:/gopath/src/github.com/denisenkom/go-mssqldb/mssql.go:695 +0x105
  github.com/denisenkom/go-mssqldb.(*Stmt).exec()
      C:/gopath/src/github.com/denisenkom/go-mssqldb/mssql.go:688 +0x125
  github.com/denisenkom/go-mssqldb.(*Stmt).ExecContext()
      C:/gopath/src/github.com/denisenkom/go-mssqldb/mssql.go:1022 +0x305
  database/sql.ctxDriverStmtExec()
      c:/go113/src/database/sql/ctxutil.go:65 +0x2c7
  database/sql.resultFromStatement()
      c:/go113/src/database/sql/sql.go:2435 +0x1cd
  database/sql.(*DB).execDC()
      c:/go113/src/database/sql/sql.go:1538 +0x33d
  database/sql.(*DB).exec()
      c:/go113/src/database/sql/sql.go:1499 +0x17b
  database/sql.(*DB).ExecContext()
      c:/go113/src/database/sql/sql.go:1477 +0xf5
  github.com/denisenkom/go-mssqldb.TestDisconnect1()
      C:/gopath/src/github.com/denisenkom/go-mssqldb/queries_test.go:2193 +0x60c
  testing.tRunner()
      c:/go113/src/testing/testing.go:909 +0x1a0
Goroutine 45 (running) created at:
  testing.(*T).Run()
      c:/go113/src/testing/testing.go:960 +0x658
  testing.runTests.func1()
      c:/go113/src/testing/testing.go:1202 +0xad
  testing.tRunner()
      c:/go113/src/testing/testing.go:909 +0x1a0
  testing.runTests()
      c:/go113/src/testing/testing.go:1200 +0x528
  testing.(*M).Run()
      c:/go113/src/testing/testing.go:1117 +0x306
  main.main()
      _testmain.go:410 +0x33e
==================
2021/07/14 19:29:34 [{Hello} {World} {TVP}]
FAIL
coverage: 77.2% of statements
exit status 1
FAIL    github.com/denisenkom/go-mssqldb    12.365s

kardianos commented 3 years ago

I think this is due to the testing code, in the testLogger that logs to a test on multiple goroutines, which is a not good I think. One way to fix this would be to remove goroutine use from reading results.

shueybubbles commented 3 years ago

Is each test function guaranteed to be running in a process where no other tests functions are competing for access to the driver instance at the same time and could be assigning different t instances to the instance-wide logger? I don't know exactly how go test manages concurrency at that level. Conceptually it seems bad to assign test-scoped object to a globally scoped logger, and just avoiding goroutine usage on read in the driver is a mitigation but not a fix. The tests need some other way to read connection logs that are specific to their connection instance. Won't go test pick up console output? Maybe testlogger can just write to stdout.

fineol commented 3 years ago

The race occurs on the "done" field of the testing structure. It starts out as false when the testing.T structure is instantiated:

https://github.com/golang/go/blob/fa6aa872225f8d33a90d936e7a81b64d2cea68e1/src/testing/testing.go#L392

    done        bool                 // Test is finished and all subtests have completed.

It is set to true in testing.tRunner by a deferred closure function after the test returns. As you can see, there is a comment about deliberately not protecting this field from races:

https://github.com/golang/go/blob/fa6aa872225f8d33a90d936e7a81b64d2cea68e1/src/testing/testing.go#L1178-L1180

    // Do not lock t.done to allow race detector to detect race in case
    // the user does not appropriately synchronizes a goroutine.
    t.done =

The races are detected in go-mssqldb's unit tests when testing.logDepth checks to see if logging is being performed after the test has finished:

https://github.com/golang/go/blob/fa6aa872225f8d33a90d936e7a81b64d2cea68e1/src/testing/testing.go#L753-L764

    if c.done {
        // This test has already finished. Try and log this message
        // with our parent. If we don't have a parent, panic.
        for parent := c.parent; parent != nil; parent = parent.parent {
            parent.mu.Lock()
            defer parent.mu.Unlock()
            if !parent.done {
                parent.output = append(parent.output, parent.decorate(s, depth+1)...)
                return
            }
        }
        panic("Log in goroutine after " + c.name + " has completed")

This code was added to the testing module in response to Go issue 1564. Essentially, this code and supporting changes made in Go 1.7 are designed to complain loudly when Fail is called on a test that has already completed. This makes good sense. You don't want to record that a test has passed when it completes and then quietly ignore a subsequent call to Fail that contradicts that status. which is what happened prior to Go 1.7.

But how does this explain go-mssqldb's problem? Its unit tests are not finishing with a report of success and subsequently calling Fail (at least I don't believe they are). The answer comes in Russ Cox's comment about the write to t.done added in testing: don't be silent if a test's goroutine fails a test after test exits:

The Lock/Unlock here is only to synchronize buggy tests that are racing the test finishing against another goroutine calling t.Fail. If you take the lock out, then tests run under the race detector will report the race, possibly more reliably. So probably take the Lock out.

They deliberately removed the synchronization from the write to t.done with the belief that it might more reliably detect late calls to testing.Fail. And although go-mssqldb's unit tests aren't calling Fail after the tests complete, some of the tests spawn goroutines that may call the test logger after the tests complete (e.g. connection cleanup). The test logger then ultimately calls testing.logDepth, and it checks c.done, causing the race with the unsynchronized write to t.done.

In other words, although testing.Log is safe for concurrent access during a test, it is deliberately not safe after a test has completed.

We could fix this by:

Writing to standard output rather than testing.Log, as shueybubbles suggests, but we lose the advantages of the test log (e.g. honoring the -v flag) if we do that.
Adding a synchronization element to our testlogger implementation to ensure it does not call testing.Log after tests finish. This would suppress any messages logged after a test finishes, which I think already happens in some cases. However, it would not bypass errors when Fail is called after a test finishes, which we wouldn't want to do.
Figure out a way to cleanly end all goroutines spawned in all tests before those tests return. That would be a lot of work and potentially require rewriting core pieces of go-mssqldb.

I think option two is the best choice. I could take a look at how much work is required. What do you two think?

shueybubbles commented 3 years ago

Can you provide code for a sample minimal test that shows what kind of synchronization mechanism you are considering? IE how would the code for a Test func change from just calling SetLogger(testLogger{t}) ?

Would it help at all if the tests just called SetLogger with a nil Logger before returning? I suppose the synchronization of the set to the new value still needs to be added, there's no synchronization in SetLogger today.

Otherwise, I don't see any public property of testing.T which would let the Printf implementation know the test is finished, so you'd need to add some kind of explicit signal from each test to the testLogger that it's done.

fineol commented 3 years ago

Would it help at all if the tests just called SetLogger with a nil Logger before returning?

That would be one way to approach it, but as you note, SetLogger is not currently synchronized, so we would have to add a synchronization mechanism and enforce it everywhere the logger is used.

My first reaction is that I'd rather introduce the cost and complexity of synchronization into the test code rather than the production code. As it stands, I think SetLogger is intended to be called once on startup and then never called again, and I suspect that is how people actually use it as well. But one could argue that SetLogger should be thread safe independent of the unit tests so that a user can safely change out the logger dynamically during run time.

Otherwise, I don't see any public property of testing.T which would let the Printf implementation know the test is finished, so you'd need to add some kind of explicit signal from each test to the testLogger that it's done.

Correct. There is no public property of testing.T that we could use, and the non-public property we would want to use is deliberately unsynchronized, causing us trouble in the first place. Therefore, we would need to add our own to testLogger.

I was thinking something like this for testLogger:

type testLogger struct {
    t    testing.TB
    mu   sync.Mutex
    done bool
}

func (l *testLogger) Printf(format string, v ...interface{}) {
    l.mu.Lock()
    if !l.done {
        l.t.Logf(format, v...)
    }
    l.mu.Unlock()
}

func (l *testLogger) Println(v ...interface{}) {
    l.mu.Lock()
    if !l.done {
        l.t.Log(v...)
    }
    l.mu.Unlock()
}

func (l *testLogger) StopLogging() {
    l.mu.Lock()
    l.done = true
    l.mu.Unlock()
}

And something like this for using it in unit tests:

func TestSomething(t *testing.T) {
    tl := testLogger{t: t}
    defer tl.StopLogging()
    SetLogger(&tl)

    // test something...
}

One potential drawback to this is that we would be introducing a synchronization mechanism to the tests that could subtly alter how different goroutines run and interact. But given that this proposal is mirroring in lock step the mutex already included in the standard library's logDepth function, I think the net effect will be simply to stop the end-of-test race condition that we (presumably) don't care about.

shueybubbles commented 3 years ago

it's one of those rare times having an auto-scoped destructor mechanism in the language would be nice.
The proposal looks fine to me. Is it possible to configure a scanner in golangci-lint to look for testLogger instantiations without a matching StopLogging call?

fineol commented 3 years ago

it's one of those rare times having an auto-scoped destructor mechanism in the language would be nice.

We could try to encourage proper use of testLogger by moving it to a separate package, keeping the struct non-public, and requiring instantiation via a function that returns both the logger and an associated Closer interface. Presumably returning a Closer along with the logger would give a good hint that the Closer should be called.

The package would look something like this (in part):

package tlog
...

func (l *testLogger) Close() error {
    l.mu.Lock()
    l.done = true
    l.mu.Unlock()
    return nil
}

func NewLogger(t *testing.T) (*testLogger, io.Closer) {
    tl := testLogger{t: t}
    return &tl, &tl
}

Usage would be something like this:

    logger, closer := tlog.NewLogger(t)
    SetLogger(logger)
    defer closer.Close()

Do you think such a scheme provides any real benefit?

One thing to keep in mind is that testLogger is not exposed to end users of go-mssqldb. It is only available for testing. Presumably the vetting of pull requests would help ensure that it is used properly.

shueybubbles commented 2 years ago

i think we can live without a separate closer function as it's likely tests will fail if the dev doesn't include a stopLogging call and we'll try to be good code reviewers.

One little thing - while you are touching the logging functions in the tests, add a t.Helper() call in them so the test traces don't use their line numbers as the trace origin.

fineol commented 2 years ago

I included the changes to resolve this issue in PR #691, because it contains other logging-related updates.

I ended up not including the t.Helper() call for two reasons:

Go 1.8 does not support t.Helper(), and this driver must build under Go 1.8. I could have separated out the testLogger implementation into its own set of files and applied appropriate build tags to get around this problem, however the next point argues against the effort.
The majority of the calls to testLogger.Printf/Println are not made directly from test functions but indirectly through the driver's optionalLogger Logger interface. Because of this, adding t.Helper() calls merely changes those log entries from tds_test.go:274 to log:70. That doesn't help much.

To compensate for the above deficiencies, I added a new field to the log message. This field contains the file and line number of the actual caller in the same form as the t.Log adds automatically, but appearing in square brackets after the timestamp. In the following example "tds_test.go:274" is added automatically by t.Log (there is nothing we can do about that) while [token.go:659] is the new field I included that gives the actual caller location:

tds_test.go:274: 2021-09-19 17:32:20.684711888 -0400 EDT m=+0.388791148 [token.go:659]: ERROR: BeginRead failed EOF

denisenkom / go-mssqldb

data race from unit test #680