Closed mxmlnkn closed 1 year ago
Thanks for the feedback.
Are you building a json
(ojson
) value for the entire 2.5 GB file? An alternative is to use the pull streaming API, specifically json_stream_cursor
or json_string_cursor
.
There are limitations to performance and memory efficiency when reading large JSON texts into trees of individually mutable JSON values (jsoncons approach.) It would be far more performant and memory efficient to read them into a single parsed document and allow immutable access only to its elements (simdjson approach). We've thought about supporting that as well. Is mutability of the in-memory representation a requirement for you?
Thanks for the comments specifically on std::stable_sort
. I think we need to provide a custom implementation of stable sort to take control over memory usage, to either use an in place stable sort or to use a shared buffer.
Some of the things that you're observing are related to how we handle the possibility of duplicate keys in the JSON text, and our requirement to keep the last one received in the parsed json value. We may need to revisit that.
json
and ojson
are implemented as flat maps, json
is sorted and uses binary search for access, ojson
maintains original order and keeps a separate sorted array of keys to support access through binary search.
We have in progress a ujson
(u
for unordered) that uses an efficient flat memory unordered map for JSON objects following robin-hood-hashing. That may make it into the next release.
Daniel
Are you building a
json
(ojson
) value for the entire 2.5 GB file? An alternative is to use the pull streaming API, specificallyjson_stream_cursor
orjson_string_cursor
.
I'm using the streaming API. However, I'm also delegating certain small nested objects to be parsed as JSON.
There are limitations to performance and memory efficiency when reading large JSON texts into trees of individually mutable JSON values (jsoncons approach.) It would be far more performant and memory efficient to read them into a single parsed document and allow immutable access only to its elements (simdjson approach). We've thought about supporting that as well. Is mutability of the in-memory representation a requirement for you?
Mutability is not a requirement. However, as the data can become arbitrarily large, the streaming approach is a requirement. Simdjson does not offer that. There seems to be a SAX-style API but it still is not a public API I think. If the immutable parsing approach could be used for the nested smaller objects while still streaming over the large array of objects I have then that could be nice I think. But, I'm not sure how well that would translate to performance gains. The objects inside the large array are ~256 B.
Thanks for the comments specifically on
std::stable_sort
. I think we need to provide a custom implementation of stable sort to take control over memory usage, to either use an in place stable sort or to use a shared buffer.
As the notes on cppreference state, you might still use std::stable_sort
but just specify an allocator that always fails to allocate. In that case it will fall back to a memory-efficient O(log^2(N) N) algorithm instead of the standard O(log(N) N) one. Unfortunately, it doesn't seem to be able to specify it by template argument but maybe something could be done by overloading new inside a dedicated namespace. It might indeed be easier to roll out a custom stable sort implementation.
json
andojson
are implemented as flat maps,json
is sorted and uses binary search for access,ojson
maintains original order and keeps a separate sorted array of keys to support access through binary search.
In my case, the index seems unnecessary because the number of keys is small (4 to 6). In that case a linear search should be most efficient anyway, so a simple std::find
instead of std::equal_range
works fine. The sorting might still be necessary when accounting for duplicates though. Btw, I don't see why std::equal_range
is used anyway. Only the first element in the found range will be returned for the call to at
.
We have in progress a
ujson
(u
for unordered) that uses an efficient flat memory unordered map for JSON objects following robin-hood-hashing. That may make it into the next release.
Note that std::unordered_map
does an allocation for each key-value pair. In other parts of my program I had to replace that with std::vector
because the key-value pairs were quite small.
I also noticed the call to reserve
inside order_preserving_json_object::insert
while using heaptrack. I'm not sure if you are aware but it seems to be used in a kind of antipattern. Quote from cppreference reserve
:
Correctly using reserve() can prevent unnecessary reallocations, but inappropriate uses of reserve() (for instance, calling it before every push_back() call) may actually increase the number of reallocations (by causing the capacity to grow linearly rather than exponentially) and result in increased computational complexity and decreased performance. For example, a function that receives an arbitrary vector by reference and appends elements to it should usually not call reserve() on the vector, since it does not know of the vector's usage characteristics.
When inserting a range, the range version of insert() is generally preferable as it preserves the correct capacity growth behavior, unlike reserve() followed by a series of push_back()s.
In my case, it does not matter though because insert
is only ever called once by the parsing algorithm when the object itself is empty. But in general, the call to reserve might be better to be replaced with a combination of resize
and std::transform
I think. Benchmarks would have to be done to be sure that that works equally well to the suggested range insert method.
For the approach of iterating in a streaming way over the large array and using an immutable one-shot parser for the nested objects, another benefit would be that those string_views that jsoncons already returns would have a more well-defined lifetime. Currently, I'm iterating over the SAX style values and after I have a nested object fully parsed, I can process it. However, my assumption is that the string_view
lifetime ends as soon as cursor::next
has been called. This means I have to do some unnecessary string copies because I have no control over the underlying I/O buffer into which the string views presumably point. Assuming that the JSON variant-like object returned by the one-shot parser would behave like a pure view into the input buffer, except maybe for integer values and such, those string copies could be avoided altogether. Whether it improves runtime is separate matter though.
Thanks for the comments. I'll have to think about this.
Incidentally, ujson
wouldn't use std::unordered_map
.
I also had to implement a stable sorting in my code and std::stable_sort
was 3x slower than a simple std::sort
. My solution was to add an index
member and then sort it with that as a secondary sort key:
for ( size_t i = 0; i < values.size(); ++i ) {
values[i].index = i;
}
std::sort(
values.begin(), values.end(),
[] ( const auto& a, const auto& b ) {
return std::make_tuple( a.sortKey, a.index ) < std::make_tuple( b.sortKey, b.index );
}
);
This is ~50% slower than std::sort on only one key but still almost 2x faster than std::stable_sort
.
Following @mxmlnkn's suggestion, json_decoder
now uses code that performs stable sorting with an index and std::sort
, code is on master. That should address the issue of temporary allocations from std::stable_sort
during deserialization to json
or ojson
values.
Thanks four providing this library!
I am trying to use it because of its memory-efficient streaming capabilities for very large JSON files.
However, the parsing performance seems slow to me. It takes ~60s for 2.5 GiB, so ~60 MB/s. That seems slow when hearing of (untested) multiple GB/s for simdjson. Even gzip decompression is many times faster (> 200 MB/s).
I tried to look a bit into performance bottlenecks and most of the time is spent almost equally much in
basic_json_parser::parse_some
,basic_json_parser::parse_string
, andjsoncons::unicode_traits::is_legal_utf8
. Nothing I can do much about. I also did take a look with heaptrack and was surprised to see millions of temporary allocations from insidestd::stable_sort
from insidejsoncons::json
.Ok, so I found it sorts keys alphabetically. I find this a questionable default because it sacrifices performance for something I don't need but it's fine as long as there is an alternative. Next, I tried
jsoncons::ojson
, which is said to preserve the original order, thinking that this should get rid of the problematicstd::stable_sort
.But
jsoncons::ojson
also callsstd::stable_sort
unexpectedly.I tried to copy-paste the code of
order_preserving_json_object
andorder_preserving_policy
into apatched_jsoncons
namespace and replaced thestd::stable_sort
insideorder_preserving_json_object::build_index
withstd::sort
and voila, the millions temporary allocations fromstd::stable_sort
are gone. I'm still seeing millions of temporary allocations insidebuild_index
for the_index.reserve
call though. I'm not sure why it is flagged astemporary
. The runtime improvements for a 2.5 GiB chrome trace JSON file are unfortunately not significant but those allocations still seem wasteful.Now, I'm not 100% sure why
std::stable_sort
is used. It seems unnecessary to me. It has to be sorted somehow becausestd::unique
is called at some point. (Sorting insidebuild_index
and then callingstd::unique
outside assuming thatbuild_index
did sort seems hard to read to me. I think the sort and call tostd::unique
should be right next to each other.)What compiler, architecture, and operating system?
What jsoncons library version?