SqliteModernCpp / sqlite_modern_cpp

The C++14 wrapper around sqlite library
MIT License
904 stars 156 forks source link

Fix undefined behaviour in utf16_to_utf8 #153

Closed zauguin closed 6 years ago

zauguin commented 6 years ago

Fixes #151.

@mtissington Please test this.

mtissington commented 6 years ago

Tested and working now - thanks!

With the changes you have implemented for converting ut16 to utf8 it would be helpful if the documentation could be updated to say a little more about this - if I open a utf16 database, it seems you process any sql as utf8 - is this safe?

zauguin commented 6 years ago

@mtissington Yes, this is safe. To quote the Sqlite3 source of the UTF-16 SQL Statement handler:

/* This function currently works by first transforming the UTF-16
  ** encoded string to UTF-8, then invoking sqlite3_prepare(). The
  ** tricky bit is figuring out the pointer to return in *pzTail.
  */

So SQLite always uses UTF-8 for SQL statements, but it allows UTF-16 statements by converting them first. This is the same behaviour this library shows, but we do this conversion by ourselves, because this gives us access to the converted string. This should never cause any change in behaviour for the user. For this reason there is no point in using UTF-16 literals for SQL statements. This will just cause an additional conversion. But this isn't anything related to our library, we just move this step around.

The text encoding of the database is only relevant for text strings stored in the database, so for example it decides if bound values of string types are converted. We never touch this potential conversion, so there should never be a problem with the database encoding.

While sqlite_modern_cpp can handle UTF-16 without problems, I personally strongly recommend not to use UTF-16. For more information about the advantages of UTF-8, see http://utf8everywhere.org/.

mtissington commented 6 years ago

Thanks for the clarity - I'm needing to use the icu library - but it makes me think, our code could be using u16string (for icu api) - you manage the input (<<) and output (>>) of parameters and results and the database could be in utf8 - or am I missing something?

zauguin commented 6 years ago

@mtissington Right, if your database uses UTF8 encoding, you can still use u16string. The conversion will be done by SQLite.