Open bold84 opened 1 year ago
Thanks for working on this, but I don't know how do I feel about introducing a separate build variant for Unicode support. IMO it would be really better to just use UTF-8 in all builds, instead of requiring a special build mode to handle Unicode.
And it would be really nice to have some description of this option in the docs, if only to explain what does enabling it change.
Finally, even if this is relatively trivial, it looks like the use of
SQLTCHAR
could avoid some preprocessor checks in the code.
The problem I have is that the data has been written already with another application and I have no influence on that. The database(s) are not UTF-8.
The changes in this PR were the only way I could figure out to prevent reading jibberish:
Perhaps you have a better suggestion on how to solve this problem. I'd be glad to implement a more elegant solution.
maybe connection_parameters can be abused to decide at runtime what type of char / string is stored in the database.
I think it might also be possible to SQLDescribeCol and determine at runtime then whether to convert back and forth. But the performance penalty is unacceptable.
Or an additional type could be introduced:
ODBC Data Type | SOCI Data Type (db_type ) |
row::get<T> specializations |
---|---|---|
SQLWCHAR, SQLWVARCHAR | db_wstring | std::string |
The user facing interface would then still only support UTF-8 (note: no std::wstring).
The problem I have is that the data has been written already with another application and I have no influence on that. The database(s) are not UTF-8.
I see, thanks.
Perhaps you have a better suggestion on how to solve this problem.
We do need to add SQLWCHAR
support to be able to work with the existing database using UTF-16, but I think it should be available in addition to UTF-8 support with plain SQLCHAR
, with the decision about which one to use being performed at run-time rather than compile-time.
Ideally, this should be automatic, i.e. when exchanging data with SQLWCHAR
columns, UTF-16 should be read from/written to them, while the same code should use UTF-8 when working with SQLCHAR
columns, but I don't know if it's possible to implement this easily, so perhaps we need a new db_wstring
type instead. If we could make this work automatically, it would be really great, however.
The main point is that I'd really, really love to avoid different incompatible builds. The conditional compilation directives in the tests are a good example of how we do not want the code using SOCI to look like. And there are other problems, e.g. we'd need to add wide char builds to the CI too if we do it like this and I'd rather avoid it.
The problem I have is that the data has been written already with another application and I have no influence on that. The database(s) are not UTF-8.
I see, thanks.
Perhaps you have a better suggestion on how to solve this problem.
We do need to add
SQLWCHAR
support to be able to work with the existing database using UTF-16, but I think it should be available in addition to UTF-8 support with plainSQLCHAR
, with the decision about which one to use being performed at run-time rather than compile-time.Ideally, this should be automatic, i.e. when exchanging data with
SQLWCHAR
columns, UTF-16 should be read from/written to them, while the same code should use UTF-8 when working withSQLCHAR
columns, but I don't know if it's possible to implement this easily, so perhaps we need a newdb_wstring
type instead. If we could make this work automatically, it would be really great, however.The main point is that I'd really, really love to avoid different incompatible builds. The conditional compilation directives in the tests are a good example of how we do not want the code using SOCI to look like. And there are other problems, e.g. we'd need to add wide char builds to the CI too if we do it like this and I'd rather avoid it.
Okay, I think we're on the same page. Shall there be an implicit conversion to std::string or shall std::wstring be supported in case of SQLWCHAR based columns?
Also note that wstring_convert has been deprecated and scheduled for removal from the standard (without replacement). It's a shame, but either way everyone here should be aware of this 👀
Also note that wstring_convert has been deprecated and scheduled for removal from the standard (without replacement). It's a shame, but either way everyone here should be aware of this 👀
That could temporarily (until there's a replacement) be solved with icu, libiconv or alike. But I wasn't sure if that's acceptable at this point. On the other hand, it could be optional.
Shall there be an implicit conversion to std::string or shall std::wstring be supported in case of SQLWCHAR based columns?
I think it would make sense to support wstring
as people using SQLWCHAR
in ODBC are most likely to use it too. But ideally I'd like to be able to support string
to/from SQLWCHAR
mapping too using UTF-8.
That could temporarily (until there's a replacement) be solved with icu, libiconv or alike.
I'd rather not pull in ICU just for this and libiconv is Unix-only. If necessary, I can contribute my own code, written many years ago, converting between UTF-8 and wchar_t
(i.e. either UTF-16 or UTF-32). It has some lower level functions as well as
std::string ToUTF8(const std::wstring& wstr);
std::wstring FromUTF8(const std::string& uft8);
That sounds good!
Okay, I'll add std::wstring support in a separate PR first.
Afterwards we can add the conversion to std::string.
We might want to move the discussion to here:
This pull request adds support for UTF-16 encoding in the ODBC module. The module previously only supported UTF-8 encoding, which made it difficult to exchange data with non-UTF-8 MSSQL databases properly.
With this new feature, the ODBC module can now properly exchange data with MSSQL databases that use UTF-16 encoding. This is especially important for users who work with databases that use different encodings or require internationalization support.
The implementation uses std::wstring_convert to convert between UTF-8 and UTF-16 encodings. The toUtf16() and toUtf8() functions have been added to handle the conversion between the two encodings.
There seem to have been issues in the past:
https://github.com/SOCI/soci/issues/164 https://github.com/SOCI/soci/issues/179 https://github.com/SOCI/soci/issues/1111