danielaparker / jsoncons

A C++, header-only library for constructing JSON and JSON-like data formats, with JSON Pointer, JSON Patch, JSON Schema, JSONPath, JMESPath, CSV, MessagePack, CBOR, BSON, UBJSON
https://danielaparker.github.io/jsoncons
Other
697 stars 160 forks source link

error: Illegal UTF-8 encoding in text string at position 25909 #463

Closed zhjr2019 closed 10 months ago

zhjr2019 commented 10 months ago

bson_file: REPLY.zip Describe the bug The BSOB file contains Chinese characters,error: Illegal UTF-8 encoding in text string at position 25909

Enumerate the steps to reproduce the bug

jsoncons::ojson zhjr::bson2json(const std::string& bson_file)
{
    jsoncons::ojson empty_ojson; //空ojson

    if (!std::filesystem::exists(bson_file))
    {
        return empty_ojson;
    }

    std::ifstream bson_file_ifstream;
    bson_file_ifstream.open(bson_file, std::ios::binary);
    if (!bson_file_ifstream.is_open())
    {
        return empty_ojson;
    }

    bson_file_ifstream.seekg(0, std::ios::end);
    std::streamsize length = bson_file_ifstream.tellg();
    bson_file_ifstream.seekg(0, std::ios::beg);
    if (length <= 0)
    {
        return empty_ojson;
    }

    std::vector<char> bson_file_buffer;
    bson_file_buffer.clear();
    bson_file_buffer.resize(length);
    bson_file_ifstream.read(&bson_file_buffer[0], length);
    bson_file_ifstream.close();

    return jsoncons::bson::decode_bson<jsoncons::ojson>(bson_file_buffer);
}

run:jsoncons::bson::decode_bson(bson_file_buffer) error: Illegal UTF-8 encoding in text string at position 25909

What compiler, architecture, and operating system?

What jsoncons library version?

danielaparker commented 10 months ago

The jsoncons error message is correct, you have a string value in your BSON file, for field name "fault_desc", that is not encoded as UTF-8. In your file, the Chinese characters "©ÓÍ" are encoded as (in hexadecimal)

c2 a9 d3 cd (corresponds to the C++ string literal "©ÓÍ")

The UTF-8 encoding is

c3 82 c2 a9 c3 93 c3 8d (corresponds to the C++ string literal u8"©ÓÍ")

A BSON string is required to be a UTF-8 string.