CSNW / sql-bricks

Transparent, Schemaless SQL Generation
http://csnw.github.io/sql-bricks
MIT License
203 stars 25 forks source link

Feature request: support unicode characters. #102

Closed parro-it closed 6 years ago

parro-it commented 6 years ago

If I try to build an update that set a a field to a string containing unicode characters (in this example I put an emoji) I get this error:

(node:29099) UnhandledPromiseRejectionWarning: Error: ER_TRUNCATED_WRONG_VALUE_FOR_FIELD: Incorrect string value: '\xF0\x9F\x98\x98' for column 'title' at row 1

I'm using mysql, should I use an extension for mysql or something? I get the same error if I try to run the generated query directly in Sequel Pro, so I think the emoji symbol has to be escaped to unicode, eg. '\\u00F0\\u009F\\u0098\\u0098'

I'm getting this results using toParams method:

{ text: 'UPDATE recipes SET id = ?, title = ?, body = ? WHERE id = ?',
  values:
   [ '1',
     'fagioli alla messicana 😗',
     'Apri i fagioli, aggiungi la salsa e l\'aglio, cuoci. \r\nA fuoco lento o rovini tutto. \r\nPepa tanto.',
     '1' ] }
prust commented 6 years ago

@parro-it, sorry for the delay responding.

I'm not sure if this is the issue you're running into, but apparently MySQL's "utf8" encoding doesn't handle 4-byte characters (like your emoji example); to get it to support four-byte characters, you have to switch it from "utf8" to "utf8mb4" (which supports all utf8 characters): In MySQL, never use “utf8”. Use “utf8mb4”.

I don't think sql-bricks should escape unicode characters, since it's often unnecessary and it's DB/connection-specific when it is, so it should usually be done in the DB driver (node-mysql, etc), or if not there, in application code.

If you do end up needing to do it in application code and you need a centralized, convenient place, one option would be for your application to wrap SQLBrick's toString() or toParams(). If you need help figuring out how to do that, or if it seems like it's not possible to do that (I'm not 100% sure off the top of my head), let me know & I should be able to help with that.

prust commented 6 years ago

@parro-it, quick follow-up, it looks like node-mysql will allow you to specify {charset: 'utf8mb4'} in the connection options & that they're planning to make this the default in v3.0: Changes default encoding from UTF8_GENERAL_CI to UTF8MB4_UNICODE_CI.

However, it sounds like that alone isn't enough -- you also need to convert your database to this encoding: How to support full Unicode in MySQL databases.