civicrm / civicrm-setup

MIT License
7 stars 5 forks source link

Use utf8mb4 encoding to support 4-byte characters #16

Closed mfb closed 5 years ago

mfb commented 5 years ago

CiviCRM doesn't actually use utf8mb4 yet, but we can start using utf8mb4 for purposes of future-proofing / best practices.

Note: This would require MySQL 5.5.3+ and I'm not sure what minimum MySQL version this library needs to support.

totten commented 5 years ago

@mfb Is there some kind of conditional we could use for this future-proofing?

In principle, we can key off any information in Setup\Model. So... an evil example might be to say "if the text 'SET NAMES utf8' appears in {$model->srcPath}/CRM/Core/DAO.php, then use utf8. Otherwise, use utf8mbr." Maybe there's some cleaner flag?

(I can write/test a conditional while doing some other stuff on civicrm-setup -- but want some general ballpark for a flag that's likely to match future releases.)

mfb commented 5 years ago

Given that utf8 is a subset of utfmb4, there shouldn't be anything wrong with preemptively using utf8mb4 for the database connection, regardless of whether utf8 or utf8mb4 is used elsewhere - unless of course versions of MySQL < 5.5.3 need to be supported.

mfb commented 5 years ago

Here's a demonstration: If column has the utf8 character set, the same error will be issued when non-utf8 characters are used. The only change is that utf8mb4 characters can be used with columns that have the utf8mb4 character set.

mysql> create table foo (bar varchar(255)) default charset utf8;
mysql> set names utf8;
mysql> insert into foo values ('🚴');
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x9A\xB4' for column 'bar' at row 1
mysql> set names utf8mb4;
mysql> insert into foo values ('🚴');
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x9A\xB4' for column 'bar' at row 1
mysql> drop table foo;
mysql> create table foo (bar varchar(255)) default charset utf8mb4;
mysql> set names utf8;
mysql> insert into foo values ('🚴');
ERROR 1366 (HY000): Incorrect string value: '\xF0\x9F\x9A\xB4' for column 'bar' at row 1
mysql> set names utf8mb4;
mysql> insert into foo values ('🚴');
Query OK, 1 row affected (0.01 sec)
totten commented 5 years ago

OK, that's cool. It is a bit of a stretch saying that we need MySQL <5.5.3, but it's easier to throw in the conditional than to prove or disprove a claim about the generally needed versions.

Merging with a tweak for discerning v5.5.3.