danielaparker / jsoncons

A C++, header-only library for constructing JSON and JSON-like data formats, with JSON Pointer, JSON Patch, JSON Schema, JSONPath, JMESPath, CSV, MessagePack, CBOR, BSON, UBJSON
https://danielaparker.github.io/jsoncons
Other
726 stars 164 forks source link

Control amount of memory reserved in ctor of json_decoder #531

Closed betp closed 3 months ago

betp commented 3 months ago

Describe the proposed feature The proposed feature is to have a mechanism to control how much memory the constructor of json_decoder reserves. The mechanism would be made available to the callers of the json::parse() function.

Reason: In my application, I need to parse many small json strings. When profiling the application, I see a lot of CPU time is spent in the constructor of json_decoder on these lines:

item_stack_.reserve(1000);
structure_stack_.reserve(100);

I assume this much memory is reserved because the focus of jsoncons is on parsing large json strings. As the json strings I need to parse are very simple, it should be sufficient to reserve much less memory, which would lead to less CPU time spent on the allocations and less memory fragmentation.

What other libraries (C++ or other) have this feature? I don't know

Include a code fragment with sample data that illustrates the use of this feature

danielaparker commented 3 months ago

Can you provide an example of how you're currently using jsoncons, along with a representative sample JSON file? Thanks.

betp commented 3 months ago

Sure:

// representative sample JSON message, e.g. received over network
std::string jsonStr = "{\"id\":1,\"version\":\"2.0\",\"result\":{\"value\":\"42\"}}";

const auto json = jsoncons::json::parse(jsonStr);

const auto& result = json.at_or_null("result");
if (!result.is_null())
{
    const auto& value = result.at_or_null("value");
    if (!value.is_null() && value.is_string())
    {
        doSomethingWithValue(value.as_string());
    }
}
danielaparker commented 3 months ago

You could reduce allocations a lot by reusing a json_decoder and a json_parser, and, assuming C++17, accessing "value" as a std::string_view, e.g.

void doSomethingWithValue(std::string_view val)
{
    std::cout << val << "\n";
}

int main()
{
    jsoncons::json_decoder<jsoncons::json> decoder;
    jsoncons::json_parser parser;

    for (std::size_t i = 0; i < 5; ++i)
    {
        std::string jsonStr = "{\"id\":1,\"version\":\"2.0\",\"value\":\"" + std::to_string(i) + "\"}";
        parser.update(jsonStr.data(), jsonStr.size());
        parser.parse_some(decoder);
        parser.finish_parse(decoder);
        parser.check_done();
        if (decoder.is_valid())
        {
            jsoncons::json json = decoder.get_result();
            const auto& value = json.at_or_null("value");
            if (value.is_string())
            {
                doSomethingWithValue(value.as<std::string_view>());
            }
        }
        decoder.reset();
        parser.reset();
    }
}
betp commented 3 months ago

Thank you for the suggestion. Good to know the json_decoder and json_parser can be reused, I will consider this.

However, implementing this approach would cause quite some changes in my code base. The parsing of the json strings happens in many places/functions. I would probably wrap the json_decoder, json_parser and the code above by a class, but I would still need to manage the lifetime of the objects of that class and pass them into all the functions that need to parse a json string. That would complicate the logic of the code.

Do you see it realistic you would extend the interface of the parse() method, as suggested above?

danielaparker commented 3 months ago

I'm reluctant to add more overrides to the json::parse function, it already has 18. But to help with the issue, I've reduced the limits for the initial buffer capacity and the initial stack depth constants to 256 and 66. I've also added an additional limit such that the initial stack depth won't exceed the max_nesting_depth (set in options) + 2. So with small json objects, you can control the initial stack depth by setting a suitably small max_nesting_depth, e.g.

std::string str = R"(
{
    "foo" : [1,2,3],
    "bar" : [4,5,{"f":6}]
})";

auto options = jsoncons::json_options{}
    .max_nesting_depth(3);
auto j = jsoncons::json::parse(str);
betp commented 3 months ago

Thank you for the changes!