danielaparker / jsoncons

A C++, header-only library for constructing JSON and JSON-like data formats, with JSON Pointer, JSON Patch, JSON Schema, JSONPath, JMESPath, CSV, MessagePack, CBOR, BSON, UBJSON
https://danielaparker.github.io/jsoncons
Other
697 stars 160 forks source link

How to solve the problem of Chinese garbled characters #464

Closed zhjr2019 closed 10 months ago

zhjr2019 commented 10 months ago

Describe the proposed feature Chinese garbled code

What other libraries (C++ or other) have this feature?

Include a code fragment with sample data that illustrates the use of this feature

#include <iostream>
#include <jsoncons/json.hpp>
#include <jsoncons_ext/bson/bson.hpp>
int main(int argc, char* argv[])
{
    jsoncons::json j;
    j.try_emplace("hello", "你好");
    std::cout << "(1)\n" << jsoncons::pretty_print(j) << "\n\n";

    std::vector<char> buffer;
    jsoncons::bson::encode_bson(j, buffer);
    std::cout << jsoncons::byte_string_view(buffer) << "\n\n";

    jsoncons::ojson oj = jsoncons::bson::decode_bson<jsoncons::ojson>(buffer);
    std::cout << "(2)\n" << jsoncons::pretty_print(oj) << "\n\n";

    system("pause");
    return EXIT_SUCCESS;
}

visual studio 2022; windows 11; Snipaste_2023-10-07_13-57-24

danielaparker commented 10 months ago

The string literal "你好" assumes ASCII characters, and the Chinese characters can't be represented in ASCII. To represent them as UTF-8, use u8"你好".

Until C++ 20, UTF-8 string literals were of type const char[N], and jsoncons allowed

j.try_emplace("hello", u8"你好");

However, since C++ 20, string literals are of type const char8_t[N], and jsoncons doesn't currently support char8_t. But you can cast to const char*,

j.try_emplace("hello", (const char *)u8"你好");
danielaparker commented 10 months ago

With jsoncons 0.171.0 (preview on master) and C++ 20, we'll have support for char8_t and std::u8string, and you'll be able to write

jsoncons::json j;

std::u8string s = u8"你好";
j.try_emplace("hello", s);
assert(j["hello"].as<std::u8string>() == s);