davidskalinder / mpeds-coder

MPEDS Annotation Interface
MIT License
0 stars 0 forks source link

Change default character set to (full) utf8 #115

Open davidskalinder opened 3 years ago

davidskalinder commented 3 years ago

At the moment, nearly everything in the database seems to be latin-1. This of course causes problems with some characters, and I'm having to work around it in #113 in ways that will probably break compatibility with any non-MySQL engines.

I think the best option for this will be to handle it at the application level by specifying the character set for the engine like this, which I think will then persist everywhere? Other options (that I'm not 100% sure will work) include specifying the character set when a new database is created and setting the character set in a my.cnf file for our deployment.

NB that MySQL's default implementation of utf8 is broken since it only allows three characters, so they created a new data type, utf8mb4, to handle real utf8. I can't figure out how well-supported that is in other engines.

The trick to this is that it will require quite a bit of DB migration. In particular, any fields that currently run up against their byte limits (or indeed a quarter of them) might need to have their sizes changed so that the same number of bigger characters will fit.