SOCI / soci

Official repository of the SOCI - The C++ Database Access Library
Boost Software License 1.0
1.37k stars 472 forks source link

How to read the utf8 encoded Chinese chars string in mysql, currently I got "??" for each character #1072

Closed asmwarrior closed 10 months ago

asmwarrior commented 10 months ago

Hi, I'm using soci under msys2's gcc.

I see the table is encoded in "UTF8", I mean I have a column like:

 `label` varchar(255) CHARACTER SET utf8mb3 COLLATE utf8mb3_general_ci DEFAULT NULL,

I store some some Chinese text, it shows quite well under some mysql gui client such as Navicat or Heidisql.

Now, I try to read the text, I got a std::string, which has many "???", I see each "?" is for a single Chinese char.

When I try to print the byte value of each "?", I got "3f 3f".

So, I believe soci automatically convert the stored chars? Any ideas?

I see a similar question in this issue: can std::string of soci hold the right utf8 data? · Issue #525 · SOCI/soci

But I don't find any document of soci mention the encoding related topics.


asmwarrior commented 10 months ago

Further information, if I put some text like “abcd中文" in the mysql, when I read in soci's std::string, I got "abcd??", which means the English chars are convert correctly, but not Chinese chars.

asmwarrior commented 10 months ago

This comes another question, how to specify the encode format?

Since I'm under Windows, I guess the default format is "GB2312", and maybe soci try to use the default format to convert the byte array stored in Mysql, which is in "UTF8" format.

asmwarrior commented 10 months ago

OK, I think I have found the solution. The solution is very simple, it suggested by AI(chatGPT), I have need to add the "charset=utf8mb4" option string when I open the soci::session.

such as:

soci::session sql(soci::mysql, "dbname=xxx user=root password=xxx charset=utf8mb4");

After that, I see that I got the correct byte array in the std::string(which is the UTF8 encoding byte array stored in the mysql).

asmwarrior commented 10 months ago

Since I have found the solution, I think this issue can be closed, I hope it can help others.

vadz commented 10 months ago

I don't know what is the default charset for MySQL but it would make sense to use UTF-8 if none is specified. I don't care enough about it to do it myself however.