go-mysql-org / go-mysql

a powerful mysql toolset with Go
MIT License
4.52k stars 967 forks source link

Random connection failure in readInitialHandshake() #876

Closed damienbeaufils closed 2 months ago

damienbeaufils commented 2 months ago

Hello there,

First of all, thank you for your work! I'm currently using the latest commit of the driver (https://github.com/go-mysql-org/go-mysql/commit/eb2c6d100ad80877ec2b88026cad8cf005442f51) with Go 1.22.2, and I get an error randomly when I try to open a database connection. 4 times over 5, it works, but sometimes it ends up with this error: runtime error: slice bounds out of range [:9] with capacity 8. The MySQL server version is: 8.0.37. I'm using MySQL in a docker container (mysql:8.0 image).

I open the connection in a very old-fashion way:

    dsn := fmt.Sprintf("%s:%s@%s/%s?ssl=true", mySqlUsername, mySqlPassword, mySqlHost, mySqlSchema)
    db, err := sql.Open("mysql", dsn)
    if err != nil {
        t.Fatalf("failed to connect to mysql: %s", err)
    }
    defer db.Close()

and do a simple select count(1) query.

And here is the stacktrace I have 1 time over 5:

panic: runtime error: slice bounds out of range [:9] with capacity 8

goroutine 1542 [running]:
github.com/go-mysql-org/go-mysql/client.(*Conn).readInitialHandshake(0xc002bb8c30)
    /home/runner/go/pkg/mod/github.com/go-mysql-org/go-mysql@v1.7.1-0.20240508103851-eb2c6d100ad8/client/auth.go:57 +0x8c6
github.com/go-mysql-org/go-mysql/client.(*Conn).handshake(0xc002bb8c30)
    /home/runner/go/pkg/mod/github.com/go-mysql-org/go-mysql@v1.7.1-0.20240508103851-eb2c6d100ad8/client/conn.go:157 +0x1c
github.com/go-mysql-org/go-mysql/client.ConnectWithDialer({0xcad3f0, 0xc0002c3340}, {0x0, 0x0}, {0xc002cc53dd, 0xf}, {0xc002cc53c8, 0x7}, {0xc002cc53d0, 0xc}, ...)
    /home/runner/go/pkg/mod/github.com/go-mysql-org/go-mysql@v1.7.1-0.20240508103851-eb2c6d100ad8/client/conn.go:127 +0x545
github.com/go-mysql-org/go-mysql/client.ConnectWithContext({0xcad3f0?, 0xc0002c3340?}, {0xc002cc53dd?, 0x1?}, {0xc002cc53c8?, 0xf?}, {0xc002cc53d0?, 0x7?}, {0xc002cc53ed, 0x6}, ...)
    /home/runner/go/pkg/mod/github.com/go-mysql-org/go-mysql@v1.7.1-0.20240508103851-eb2c6d100ad8/client/conn.go:79 +0x105
github.com/go-mysql-org/go-mysql/client.Connect({0xc002cc53dd, 0xf}, {0xc002cc53c8, 0x7}, {0xc002cc53d0, 0xc}, {0xc002cc53ed, 0x6}, {0xc000051948, 0x1, ...})
    /home/runner/go/pkg/mod/github.com/go-mysql-org/go-mysql@v1.7.1-0.20240508103851-eb2c6d100ad8/client/conn.go:73 +0x11a
github.com/go-mysql-org/go-mysql/driver.driver.Open({}, {0xc002cc5380?, 0xc002e10001?})
    /home/runner/go/pkg/mod/github.com/go-mysql-org/go-mysql@v1.7.1-0.20240508103851-eb2c6d100ad8/driver/driver.go:102 +0x19b
database/sql.dsnConnector.Connect(...)
    /opt/hostedtoolcache/go/1.22.2/x64/src/database/sql/sql.go:791
database/sql.(*DB).conn(0xc002bb8b60, {0xcad2a0, 0x114e4c0}, 0x1)
    /opt/hostedtoolcache/go/1.22.2/x64/src/database/sql/sql.go:1415 +0x71e
database/sql.(*DB).query(0xc002bb8b60, {0xcad2a0, 0x114e4c0}, {0xb8f886, 0x10}, {0x0, 0x0, 0x0}, 0xa0?)
    /opt/hostedtoolcache/go/1.22.2/x64/src/database/sql/sql.go:1749 +0x57
database/sql.(*DB).QueryContext.func1(0xa0?)
    /opt/hostedtoolcache/go/1.22.2/x64/src/database/sql/sql.go:1732 +0x4f
database/sql.(*DB).retry(0xc000051d28?, 0xc000051cd0)
    /opt/hostedtoolcache/go/1.22.2/x64/src/database/sql/sql.go:1566 +0x42
database/sql.(*DB).QueryContext(0xb88d5d?, {0xcad2a0?, 0x114e4c0?}, {0xb8f886?, 0x3?}, {0x0?, 0x41097b?, 0x7f3a2a14c5b8?})
    /opt/hostedtoolcache/go/1.22.2/x64/src/database/sql/sql.go:1731 +0xc5
database/sql.(*DB).Query(...)
    /opt/hostedtoolcache/go/1.22.2/x64/src/database/sql/sql.go:1745

I managed to debug the readInitialHandshake() and capture the data value which cause this issue, and data contains only this: <0x0b><0x08><0x05><0x1a><0x00>: image

Any clue? I don't know if it's related to recent changes. Let me know if I can add some flags or verbosity to have much information to share.

Thanks!

lance6716 commented 2 months ago

Thanks for reporting, I'll take a look soon

lance6716 commented 2 months ago

As the document says, the first byte in data should be 10 https://dev.mysql.com/doc/dev/mysql-server/latest/page_protocol_connection_phase_packets_protocol_handshake_v10.html but in your case it's 11 🤔 I think I need more time to read the source code of MySQL server to locate it.

And this package should check it's 10 rather than >= 10

lance6716 commented 2 months ago

I also didn't find 11 in source code, can you repeatedly check the failed data is always <0x0b><0x08><0x05><0x1a><0x00>?

damienbeaufils commented 2 months ago

Hello @lance6716, Thank you for taking a look. I reproduced this issue 5 times, and I always had <0x0b><0x08><0x05><0x1a><0x00> in data variable.

lance6716 commented 2 months ago

Can you try https://github.com/go-sql-driver/mysql and check if the error is still there? I must say that driver is more popular if you don't need binlog replication and compression.

damienbeaufils commented 2 months ago

I was using this library at first, but I encountered some issues (maybe the same, but the error messages were not as explicit as yours), that's why I shifted. If it's difficult to tackle, I guess we can act that it will be a known issue / won't fix.

lance6716 commented 2 months ago

I'll try to reproduce it in my spare time. Please provide enough information for me to reproduce. Now my top suspect is your MySQL image 😂

damienbeaufils commented 2 months ago

Good point, I'm using the official latest image, but I'll try to downgrade the version and I'll let you know.

dveeden commented 2 months ago

The 'Classic' MySQL protocol always has version 10. There was a version 9, but that's really really old.

And X Protocol use version 11.

dvaneeden@dve-carbon:~$ echo | nc 127.0.0.1 3307 | hexdump -C
00000000  49 00 00 00 0a 38 2e 34  2e 30 00 09 00 00 00 74  |I....8.4.0.....t|
00000010  28 76 05 29 54 32 11 00  ff ff ff 02 00 ff df 15  |(v.)T2..........|
00000020  00 00 00 00 00 00 00 00  00 00 36 45 14 06 66 57  |..........6E..fW|
00000030  06 1d 4e 6e 0a 4f 00 63  61 63 68 69 6e 67 5f 73  |..Nn.O.caching_s|
00000040  68 61 32 5f 70 61 73 73  77 6f 72 64 00 33 00 00  |ha2_password.3..|
00000050  01 ff 86 04 23 30 38 53  30 31 47 6f 74 20 61 6e  |....#08S01Got an|
00000060  20 65 72 72 6f 72 20 72  65 61 64 69 6e 67 20 63  | error reading c|
00000070  6f 6d 6d 75 6e 69 63 61  74 69 6f 6e 20 70 61 63  |ommunication pac|
00000080  6b 65 74 73                                       |kets|
00000084
dvaneeden@dve-carbon:~$ echo | nc 127.0.0.1 33070 | hexdump -C
00000000  05 00 00 00 0b 08 05 1a  00                       |.........|
00000009

This is with:

podman run -p 3307:3306 -p 33070:33060 --env MYSQL_ALLOW_EMPTY_PASSWORD=1 -it mysql:8.4.0

Note that the classic protocol should be on port 3306 and X Protocol on 33060.

damienbeaufils commented 2 months ago

@dveeden From what I understand, I could mean that the issue I'm experiencing is due to the wrong port usage? In the code I'm working on, the host/port is loaded dynamically since it can change for every container, and I noticed that the method that handles the container connection returns the first exposed port (and not the explicit port that maps 3306). I guess sometimes 33060 could be exposed before 3306, which causes this issue. I'm gonna check this and get back to you.

damienbeaufils commented 2 months ago

Well, it seems that sometimes, randomly, the first exposed port by MySQL container is 33060, and not 3306. Explicitly using the container port that is mapped to the 3306 port fixes the issue. Thank you so much @lance6716 and @dveeden for helping me corner this 🙏
I'm closing this since there is no issue with the go-mysql client 👍

dveeden commented 2 months ago

I think we should make sure go-mysql returns a clear error instead of a "slice bounds out of range".