Open gvollant opened 2 years ago
This is a bit of a guess as I don't have any performance numbers to back it up, but just from reading the code it would seem that reusing the same document to parse the JSON string may be generally faster. Especially if your incoming JSON strings tend to be of the same length or have fairly similar content. Internally, the document object is using a stack to capture incoming JSON values like strings. In the initial parse, it'll have to grow and allocate additional space. On subsequent parses using the same Document, I'd have to assume you avoid this memory allocation overhead.
Do keep in mind that a successful parse of a string does modify the source Document, the Document effectively becoming the root GenericValue of the parsed JSON content. As long as that works for you, the reuse should work.
Consider writing some test cases and timing the results to verify my assumptions. I don't quite have the time to try and spin up a benchmark to test them myself :).
I did actually get around to playing with a benchmark where I'm parsing a fairly large file of approximately 250MB of fairly simple, repetitive JSON content. The numbers suggest that reusing the document resulted in about 25ms less wall and CPU time to perform the parse.
Run on (16 X 2496 MHz CPU s)
CPU Caches:
L1 Data 48 KiB (x8)
L1 Instruction 32 KiB (x8)
L2 Unified 1280 KiB (x8)
L3 Unified 24576 KiB (x1)
-------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------
rapidjson_parse_mean 706295690 ns 704687500 ns 10
rapidjson_parse_median 707400700 ns 703125000 ns 10
rapidjson_parse_stddev 3698335 ns 4941059 ns 10
rapidjson_parse_cv 0.52 % 0.70 % 10
rapidjson_parse_same_document_mean 681601380 ns 679687500 ns 10
rapidjson_parse_same_document_median 680710500 ns 679687500 ns 10
rapidjson_parse_same_document_stddev 3993162 ns 8235098 ns 10
rapidjson_parse_same_document_cv 0.59 % 1.21 % 10
static void rapidjson_parse(benchmark::State& state)
{
static constexpr auto buffer_size = 65536;
const auto buffer = std::make_unique_for_overwrite<char[]>(buffer_size);
for (auto _ : state)
{
UniqueFilePtr file{benchmark_param::input_file, "r"};
if(!file.IsOpen())
{
fmt::print(stderr, "Failed to open [{}]\n", benchmark_param::input_file);
}
rapidjson::FileReadStream stream{file.Get(), buffer.get(), buffer_size};
rapidjson::Document doc;
doc.ParseStream(stream);
}
}
BENCHMARK(rapidjson_parse);
static void rapidjson_parse_same_document(benchmark::State& state)
{
static constexpr auto buffer_size = 65536;
const auto buffer = std::make_unique_for_overwrite<char[]>(buffer_size);
rapidjson::Document doc;
for (auto _ : state)
{
UniqueFilePtr file{benchmark_param::input_file, "r"};
if(!file.IsOpen())
{
fmt::print(stderr, "Failed to open [{}]\n", benchmark_param::input_file);
}
rapidjson::FileReadStream stream{file.Get(), buffer.get(), buffer_size};
doc.ParseStream(stream);
}
}
BENCHMARK(rapidjson_parse_same_document);
Hello, I have a loop which parse each time a new json, using "Document doc; doc.Parse(c_str_json);" each time
Is it possible to re-use the same Document structure for each parsing ? If yes, what is the faster method?