Recycling Document structure

gvollant commented 2 years ago

Hello, I have a loop which parse each time a new json, using "Document doc; doc.Parse(c_str_json);" each time

Is it possible to re-use the same Document structure for each parsing ? If yes, what is the faster method?

dinomight commented 2 years ago

This is a bit of a guess as I don't have any performance numbers to back it up, but just from reading the code it would seem that reusing the same document to parse the JSON string may be generally faster. Especially if your incoming JSON strings tend to be of the same length or have fairly similar content. Internally, the document object is using a stack to capture incoming JSON values like strings. In the initial parse, it'll have to grow and allocate additional space. On subsequent parses using the same Document, I'd have to assume you avoid this memory allocation overhead.

Do keep in mind that a successful parse of a string does modify the source Document, the Document effectively becoming the root GenericValue of the parsed JSON content. As long as that works for you, the reuse should work.

Consider writing some test cases and timing the results to verify my assumptions. I don't quite have the time to try and spin up a benchmark to test them myself :).

dinomight commented 2 years ago

I did actually get around to playing with a benchmark where I'm parsing a fairly large file of approximately 250MB of fairly simple, repetitive JSON content. The numbers suggest that reusing the document resulted in about 25ms less wall and CPU time to perform the parse.

Run on (16 X 2496 MHz CPU s)
CPU Caches:
  L1 Data 48 KiB (x8)
  L1 Instruction 32 KiB (x8)
  L2 Unified 1280 KiB (x8)
  L3 Unified 24576 KiB (x1)
-------------------------------------------------------------------------------
Benchmark                                     Time             CPU   Iterations
-------------------------------------------------------------------------------
rapidjson_parse_mean                  706295690 ns    704687500 ns           10
rapidjson_parse_median                707400700 ns    703125000 ns           10
rapidjson_parse_stddev                  3698335 ns      4941059 ns           10
rapidjson_parse_cv                         0.52 %          0.70 %            10
rapidjson_parse_same_document_mean    681601380 ns    679687500 ns           10
rapidjson_parse_same_document_median  680710500 ns    679687500 ns           10
rapidjson_parse_same_document_stddev    3993162 ns      8235098 ns           10
rapidjson_parse_same_document_cv           0.59 %          1.21 %            10

static void rapidjson_parse(benchmark::State& state)
{
    static constexpr auto buffer_size = 65536;
    const auto buffer = std::make_unique_for_overwrite<char[]>(buffer_size);
    for (auto _ : state)
    {
        UniqueFilePtr file{benchmark_param::input_file, "r"};
        if(!file.IsOpen())
        {
            fmt::print(stderr, "Failed to open [{}]\n", benchmark_param::input_file);
        }

        rapidjson::FileReadStream stream{file.Get(), buffer.get(), buffer_size};
        rapidjson::Document doc;
        doc.ParseStream(stream);
    }
}

BENCHMARK(rapidjson_parse);

static void rapidjson_parse_same_document(benchmark::State& state)
{
    static constexpr auto buffer_size = 65536;
    const auto buffer = std::make_unique_for_overwrite<char[]>(buffer_size);
    rapidjson::Document doc;

    for (auto _ : state)
    {
        UniqueFilePtr file{benchmark_param::input_file, "r"};
        if(!file.IsOpen())
        {
            fmt::print(stderr, "Failed to open [{}]\n", benchmark_param::input_file);
        }

        rapidjson::FileReadStream stream{file.Get(), buffer.get(), buffer_size};
        doc.ParseStream(stream);
    }
}

BENCHMARK(rapidjson_parse_same_document);

Tencent / rapidjson

Recycling Document structure #1942