Open trueqbit opened 2 years ago
Yeah what you are saying is right. The only disadvantage related to your work was that we've broken UT in dev
by my oversight.
No worries, when I touch code, I always find those corner cases, no matter the library or framework :)
@trueqbit you're good!
@trueqbit can we close it?
No! Still needs to be addressed.
looks like we are facing universal text encoding issue. There is no universal text encoding in standard C++ library so we are unable to fix it now. But when universal encoding appears in standard C++ lib then we can add it. Also users can add customizations for wstring
using external libraries within their projects right now
What if we enable wchar_t
support only on Windows (which is UTF-16)? This is incredibly useful on Windows and I wouldn't be happy if it went away. SQLite itself supports UTF-16, and we could use SQLite UTF-16 APIs directly.
UTF-16 on other platforms is a delicate subject and would only be doable with a sane opt-in and only worth it if someone absolutely requests it.
Otherwise there are universal C++ conversion functions from UTF-32 [std::convert_utf8<char32_t>
] and there is even a native UTF-32 character type char32_t
. However, I would assign UTF-32 support to a different ticket.
1) convert_utf8
is deprecated in C++17. That is why it is used in sqlite_orm
under dedicated compile flag
2) how do you know that Windows uses wchar_t
for UTF-16? Is it just a practical finding or there is a doc with it?
codecvt_utf8_utf16
is deprecated.wchar_t
.
As already mentioned in Slack, the support for wide-character strings needs a rather complete overhaul and/or an explicit documentation of its features. As far as I understand it is broken on macos/Linux.
sqlite_*_text16()
vs.wstring_convert<codecvt_utf8_utf16<wchar_t>>
is kind of intermixed.codecvt_utf8_utf16<wchar_t>
, but neither conversion from a column value nor calling a function or returning from it. [see test case]statement_binder<>::result()
]sqlite3_result_text16()
expects the number of bytes, not characters. [see 3rd parameter]sqlite3_result_text16()
should be instructed to copy the string usingSQLITE_TRANSIENT
), otherwise the resulting memory goes out of scope. [see 4th parameter]sizeof(wchar_t) == 2
(16-bit), and encoding is UTF-16.sizeof(wchar_t) == 4
(32-bit):sqlite3_*_16()
functions is outrightly wrong.codecvt_utf8_utf16<>
is bad:[codecvt_utf8_utf16<>](https://en.cppreference.com/w/cpp/locale/codecvt_utf8_utf16)
expects UTF-16, no matter the sizeof wchar_t: "If Elem is a 32-bit type, one UTF-16 code unit will be stored in each 32-bit character of the output sequence.". I emphasize again that this isn't the regular expectation on macos/Linux.SQLITE_ORM_OMITS_CODECVT
, if possible: One might want to be able to pass or return wide-strings from Windows API functions, even if not being able to serialize the statement.One way of fixing the UTF-16 issue on macos/Linux quickly is by disabling UTF-16 unicode when not on Windows altogether. This might not even have any impact, given that UTF-8 is prevalent on those systems.