denisenkom / go-mssqldb

Microsoft SQL server driver written in go language
BSD 3-Clause "New" or "Revised" License
1.82k stars 495 forks source link

TestQueryCancelHighLevel and TestQueryTimeout fail against remote servers #679

Open shueybubbles opened 3 years ago

shueybubbles commented 3 years ago

Describe the bug Set the server for testing to a networked SQL instance using Windows auth go test

Exception message:
--- FAIL: TestQueryCancelHighLevel (108.90s)
    tds_test.go:257: Starting SSPI login
    tds_test.go:253: got token tokenSSPI
    tds_test.go:253: ERROR: BeginRead failed read tcp <redacted>:1114-><redacted>:1433: wsarecv: An existing connection was
forcibly closed by the remote host.
    queries_test.go:1945: ExecContext expected to fail with Cancelled but it returned did not get cancellation confirmation from the server

--- FAIL: TestQueryTimeout (120.25s)
    tds_test.go:257: Starting SSPI login
    tds_test.go:253: got token tokenSSPI
    tds_test.go:253: ERROR: BeginRead failed read tcp <redacted>:1050-><redacted>:1433: wsarecv: An existing connection was
forcibly closed by the remote host.
    queries_test.go:1964: ExecContext expected to fail with DeadlineExceeded but it returned did not get cancellation confirmation
from the server

Further technical details

SQL Server version: I tried Sql 2019 and Sql 2017 Operating system: Windows 10

The client running the test is at my house, the servers are running as VMs in Azure and connection is through a VPN to our corporate network then through ExpressRoute to the servers. I have no problems using SSMS with these servers.

kardianos commented 3 years ago

Both are similar errors.

Does this work if you run go test -cpu 4 ? That would be unexpected, but interesting to know.

shueybubbles commented 3 years ago

they fail the same way with -cpu 4. It seems to happen while establishing the connection Running the tests individually fail as well, not just when run en masse. image

shueybubbles commented 3 years ago

could it be that 200 ms isn't long enough to establish the connection and the test is assuming that it is? If I bump up the wait to 1500 ms it passes. Maybe SQL closes the connection without a cancel confirmation if the login hasn't completed yet and token.go needs to be aware of this state.

kardianos commented 3 years ago

That could be. In your tests you describe a number of network hops and TDS is somewhat chatty.

shueybubbles commented 3 years ago

would you recommend we just increase the timeout in the test or is it worthwhile to revisit the logic in tokenProcessor.nextToken not to send attention if we aren't logged in yet? Ultimately I suspect most apps would behave ok with the current behavior but the error messages won't be accurate. EG nextToken could take another parameter telling it how to handle ctx->Done() and connect would tell it just to return

shueybubbles commented 3 years ago

Is the timeout parameter on the context.WithTimeout call also applied to the underlying network connection? If I remove the attention/cancel wait from this code path, the connect fails differently:

func TestQueryTimeout(t *testing.T) {
    conn := open(t)
    defer conn.Close()
    ctx, cancel := context.WithTimeout(context.Background(), 200*time.Millisecond)
    defer cancel()
    _, err := conn.ExecContext(ctx, "waitfor delay '00:00:03'")
    if err != context.DeadlineExceeded {
        t.Errorf("ExecContext expected to fail with DeadlineExceeded but it returned %v", err)
    }

    // connection should be usable after timeout
    row := conn.QueryRow("select 1")
    var val int64
    err = row.Scan(&val)
    if err != nil {
        t.Fatal("Scan failed with", err)
    }
}

error:

=== RUN   TestQueryTimeout
    i:\git\go-mssqldb\queries_test.go:1964: ExecContext expected to fail with DeadlineExceeded but it returned unable to open tcp connection with host 'hostname:1433': dial tcp <redacted>:1433: i/o timeout

My change to token.go is basically this, where connect sets t.noAttn to true:

    case <-t.ctx.Done():
        if t.noAttn {
            return nil, t.ctx.Err()
        }
shueybubbles commented 3 years ago

It looks like the dialer sets its own deadline to the Context's deadline and returns an OpError that wraps a net error I don't see a deterministic way to predict which error it's going to be if the context timeout is not always long enough to even establish the initial connection.

shueybubbles commented 3 years ago

One place where the driver transforms the network error to something less useful is in dialConnection:

    if conn == nil {
        f := "unable to open tcp connection with host '%v:%v': %v"
        return nil, fmt.Errorf(f, p.Host, resolveServerPort(p.Port), err.Error())
    }

Is it considered a breaking change to define a new error here that wraps the returned err instead of returning a generic fmt.Errorf that has the message? I'd expect a caller would like to use errors.Is to see if the error is one of the flavors of deadline exceeded or network timeout. Should definition of the appropriate error type here also be part of the change for #586 ?

The test that I am trying to write looks like this. I want to verify that if I remove the attention send during a timed out login that the returned error is a network timeout. I think I'd need a more complex version of this test as well, as I need to inject a delay after receiving the login ack but before completing the connection.

func TestLoginTimeout(t *testing.T) {
    conn := open(t)
    defer conn.Close()
    ctx, cancel := context.WithTimeout(context.Background(), 100*time.Nanosecond)
    defer cancel()
    _, err := conn.ExecContext(ctx, "waitfor delay '00:00:01'")
    if oe, ok := err.(*net.OpError); ok {
        if !oe.Timeout() {
            t.Fatalf("Got non-timeout error %s", oe.Error())
        }
    } else {
        t.Fatalf("wrong kind of error for login timeout: %+v", err)
    }

    // connection should be usable after timeout
    row := conn.QueryRow("select 1")
    var val int64
    err = row.Scan(&val)
    if err != nil {
        t.Fatal("Scan failed with", err)
    }
}