crystal-lang / crystal-mysql

MySQL connector for Crystal
MIT License
107 stars 36 forks source link

Connect to database with collation/character other than UTF8 #46

Closed kazzkiq closed 1 year ago

kazzkiq commented 7 years ago

In some cases other collations are needed in order to accept Emojis and/or non-latin characters.

It seems that as of today, this project do not provide a way to connect to MySQL database with any other option than UTF8, which leads to database errors when trying to add those "special" new characters.

With the wide use of smartphones, its basically impossible to build any application that do not support emojis. Its used anywhere, and its popularizing even more each year, so I believe the project should support some way to handle this.

crisward commented 7 years ago

There are some short codes sometimes used for emojis :smile: = :smile: but I also think this would be good.

metacortex commented 7 years ago

https://github.com/crystal-lang/crystal-mysql/blob/master/src/mysql/packets.cr#L75 Just replacing value 0x21u8 to 0x2du8 utf8mb4_general_ci works.

crisward commented 7 years ago

When a database is created, the 'Encoding' and 'Collation' are set. The driver should probably detect this and set it to be the same, instead of being hard coded. I'll have a quick look to see how this can be read.

crisward commented 7 years ago

Something like this give us the correct settings for the database.

SELECT DEFAULT_CHARACTER_SET_NAME, DEFAULT_COLLATION_NAME FROM INFORMATION_SCHEMA.SCHEMATA WHERE SCHEMA_NAME = 'db_name'

We just then need to convert the names to the correct bytes.

crisward commented 7 years ago

The full list for the connected database can be found with SHOW COLLATION, the id column just needs converting to a Unit8 based on the result from the above query.

crisward commented 7 years ago

It look like this info may already be retrieved during the handshake - https://github.com/crystal-lang/crystal-mysql/blob/master/src/mysql/packets.cr#L20 however I've done some manual testing and the handshake seems to return 33 / utf8_general_ci even if the database is utf8mb4_general_ci.

r3bo0t commented 6 years ago

@crisward @waj Is there any plan to support setting the in application database collation (utf8_general_ci or utf8mb4_general_ci or whatever one sets in their corresponding database). As I can see there is no closer to this issue.

ysbaddaden commented 6 years ago

BTW: utf8 in MySQL is invalid, the valid UTF-8 character set is utf8mb4 and maybe it should be the default.

https://medium.com/@adamhooper/in-mysql-never-use-utf8-use-utf8mb4-11761243e434

kazzkiq commented 5 years ago

Do we have any updates in this?

girng commented 5 years ago

@kazzkiq I don't think this is an issue with this repo, setting the collation is db specific https://stackoverflow.com/questions/38949115/how-to-change-the-connection-collation-of-mysql