fnc12 / sqlite_orm

❤️ SQLite ORM light header only library for modern C++
GNU Affero General Public License v3.0
2.26k stars 313 forks source link

Overhaul wide-string support #942

Open trueqbit opened 2 years ago

trueqbit commented 2 years ago

As already mentioned in Slack, the support for wide-character strings needs a rather complete overhaul and/or an explicit documentation of its features. As far as I understand it is broken on macos/Linux.

One way of fixing the UTF-16 issue on macos/Linux quickly is by disabling UTF-16 unicode when not on Windows altogether. This might not even have any impact, given that UTF-8 is prevalent on those systems.

fnc12 commented 2 years ago

Yeah what you are saying is right. The only disadvantage related to your work was that we've broken UT in dev by my oversight.

trueqbit commented 2 years ago

No worries, when I touch code, I always find those corner cases, no matter the library or framework :)

fnc12 commented 2 years ago

@trueqbit you're good!

fnc12 commented 2 years ago

@trueqbit can we close it?

trueqbit commented 2 years ago

No! Still needs to be addressed.

fnc12 commented 2 years ago

looks like we are facing universal text encoding issue. There is no universal text encoding in standard C++ library so we are unable to fix it now. But when universal encoding appears in standard C++ lib then we can add it. Also users can add customizations for wstring using external libraries within their projects right now

trueqbit commented 2 years ago

What if we enable wchar_t support only on Windows (which is UTF-16)? This is incredibly useful on Windows and I wouldn't be happy if it went away. SQLite itself supports UTF-16, and we could use SQLite UTF-16 APIs directly. UTF-16 on other platforms is a delicate subject and would only be doable with a sane opt-in and only worth it if someone absolutely requests it.

Otherwise there are universal C++ conversion functions from UTF-32 [std::convert_utf8<char32_t>] and there is even a native UTF-32 character type char32_t. However, I would assign UTF-32 support to a different ticket.

fnc12 commented 2 years ago

1) convert_utf8 is deprecated in C++17. That is why it is used in sqlite_orm under dedicated compile flag 2) how do you know that Windows uses wchar_t for UTF-16? Is it just a practical finding or there is a doc with it?

trueqbit commented 2 years ago
  1. Oh yes you are right, I thought that only UTF-16 conversion using codecvt_utf8_utf16 is deprecated.
  2. Programming on Windows is not undefined behaviour 🤣 UTF-16 is used since Windows 2000 as a super-set of UCS-2 (see wikipedia article), the whole Win32 API is using UTF-16 as per Microsoft's documentation (see article about Unicode), and based on wchar_t.