So, you thought ahead when setting up your MySQL database and set your
character encoding to utf8
to make it easier to store international characters
and actually read them back out again.
But then someone sets their name to an emoji, and it isn't being read back
properly from the database. A little bit of digging reveals that utf8
on MySQL
is really just a subset of the full UTF-8 character set. What you
really wanted was utf8mb4. At this point, you have a few choices.
There are a number of resources for switching the character set and collation for your MySQL databases, tables and columns. The best write-up is by Mathais, but there's also useful info from Alon Diamant if you happen to be running in AWS.
Please read everything you can before proceeding!!! This script attempts to
safely and automagically convert utf8
(and latin1
, if you are daring), but
it may not work with your dataset. Backup before proceeding, run it in a test
environment if you can.
Before proceeding:
utf8mb4
.This app requires Node.js.
$ npm install -g https://github.com/building5/utf8mb4-converter.git
$ utf8mb4-converter [OPTIONS...]
utf8mb4-converter
and inspect the DDL it will execute to see what it
is going to do to your database.utf8mb4-converter --make-it-so
to execute the DDL on the server.If you have some data encoded as latin1
, and you are really, really sure
that it only has ASCII characters in it, then you can provide --force-latin1
to convert those databases/tables/columns to utf8mb4
. Any international
characters in those columns will probably be corrupted.
If you'd like to see more of what the script is doing, pass in --verbose
.
If there are some databases on your MySQL server you'd rather not convert, you
can pass them to --skip
to, well, skip them.
InnoDB has an index length limit of 767 bytes per column. For utf8mb3
, this
conveniently works out to 255 characters. But for utf8mb4
, this is only 191
characters. If you have any columns longer than that, you will either have to
limit the index to 191 characters of the column, or narrow the column to 191
characters.
ISC license. PRs welcome.