dolthub / dolt

Dolt – Git for Data
Apache License 2.0
17.83k stars 506 forks source link

Pasting into Datagrip TEXT field puts null #5735

Closed Hydrocharged closed 1 year ago

Hydrocharged commented 1 year ago

When connecting to a Dolt server using Datagrip (or any JetBrains product that has the same database functionality), attempting to paste into a SMALLTEXT/TEXT/MEDIUMTEXT field causes a null to be written instead. This does not occur with a standard MySQL instance.

We hypothesize that vitess is returning an incorrect packet containing the field information, as there seem to be discontinuities between what the Java MySQL connector expects, and what vitess sends.

https://github.com/mysql/mysql-connector-j/blob/release/8.0/src/main/protocol-impl/java/com/mysql/cj/protocol/a/ColumnDefinitionReader.java#L118

https://github.com/dolthub/vitess/blob/64f98b0cb75a3306caf49558186aa071cae13ce0/go/vt/proto/query/query.pb.go#L1241

This may or may not be the issue, and it’s also possible that vitess implements an older revision (such as 5.7).

Datagrip also has an error on their side regarding the null pasting, but we’re still exhibiting an issue where MySQL does not, so an older version of Datagrip may need to be used if the most recent version does not exhibit the bug.

https://youtrack.jetbrains.com/issue/DBE-17300/Pasting-string-into-LONGTEXT-field-MySQL-always-gets-null-value

fulghum commented 1 year ago

I'm able to repro the null paste issue in DataGrip with Dolt (and don't see it happen with MySQL). I've dug into the exact values MySQL and Dolt are sending back for the max response bytes field, and sure enough Dolt is sending back a different value from MySQL for MEDIUMTEXT (Dolt sends 67,108,860 and MySQL sends: 50,331,645).

I did a quick sanity test and hardcoded Dolt to send back the same value as MySQL, and with that change, I'm no longer able to trigger the null paste issue in DataGrip.

I don't yet understand exactly how MySQL is computing this value, so I still need to dig in a little bit more, but looks like we're close to getting this one fixed. I'll audit the other TEXT/BLOB types as well and come up with a way to test this better, too.

fulghum commented 1 year ago

After looking at this a little more closely... what seems to be different is how Dolt and MySQL are calculating the maximum character bytes for the character set. Dolt is looking at the charset on the table and using 4 as the modifier to calculate the type's maximum field length, but MySQL is using 3 as the modifier. If Dolt were to change to use 3 (i.e. as if for utf8mb3) then Dolt would send the exact same values as MySQL. That explains why we are seeing different values for TINYTEXT, TEXT, and MEDIUMTEXT, which seems to trigger the null paste bug in DataGrip.

In DataGrip's connection properties, characterSetResults defaults to utf8, which is a (deprecated) alias for utf8mb3 and is expected to change to reference utf8mb4 in the future. I tried explicitly setting this to utf8mb4, but the driver doesn't seem to allow that. In the DataGrip console for my local MySQL connection, I see these charset settings:

character_set_client,utf8mb4
character_set_connection,utf8mb4
character_set_database,utf8mb4
character_set_filesystem,binary
character_set_results,utf8mb3
character_set_server,utf8mb4
character_set_system,utf8mb3

In the DataGrip console for the local Dolt connection, these are the charset settings I see:

character_set_client,utf8mb4
character_set_connection,utf8mb4
character_set_database,utf8mb4
character_set_filesystem,binary
character_set_results,utf8
character_set_server,utf8mb4
character_set_system,utf8mb4

There are also some notes at the bottom of the MySQL Connector/J documentation that explain some tricky points around Connector/J's handling of utf8 charsets.

I'll keep digging into this one some more, but I suspect we need to pull in @Hydrocharged for additional context around charset support to make sure we fix this correctly.

fulghum commented 1 year ago

Quick update... we understand what's going on and the results Dolt needs to return; I'm just figuring out the cleanest way to change the code to make that happen. I chatted with Daylon yesterday afternoon about some of the charset details and have some ideas to try out today. My first test is looking successful so far (pasting works for mediumtext cols in DataGrip 🎉), but I still need to clean up a few things, test some more, and make sure this won't affect anything else. Initial test results this morning are looking promising though!

fulghum commented 1 year ago

Fix is merged into go-mysql-server; I'm working on getting it into Dolt right now and then will run a release.

fulghum commented 1 year ago

Fix was just released in Dolt 0.75.12. Thank you for reporting this one so we could fix it!