danielaparker / jsoncons

A C++, header-only library for constructing JSON and JSON-like data formats, with JSON Pointer, JSON Patch, JSON Schema, JSONPath, JMESPath, CSV, MessagePack, CBOR, BSON, UBJSON
https://danielaparker.github.io/jsoncons
Other
720 stars 163 forks source link

Need a std::map or unordered_map implementation. #346

Open nicehero opened 2 years ago

nicehero commented 2 years ago

Describe the proposed feature add type jsoncons::mjson (std::map or unordered_map implementation) jsoncons::mojson (preserve order std::map or unordered_map implementation) like jsoncons::json (sort vector implementation) jsoncons::ojson (preserve order sort vector implementation)

The performance of insert is too low. https://github.com/nicehero/json_benchmark

And I love other featrues from this project such as proxy_type ,bson support etc.

nlohmann insert&erase times:1000 cost:0.000472 qps:2117616 jsoncons insert&erase times:1000 cost:0.011752 qps:85090 rapidjson insert&erase times:1000 cost:0.002199 qps:454700

jpg

danielaparker commented 2 years ago

Thanks for posting this, the results are quite interesting. It's uncommon to see json benchmarks that go beyond parsing and serializing.

In terms of structure, nlohmann stores json objects as a std::map<key,value>, jsoncons as a std::vector of key-value pairs sorted by name, and rapidjson as an unsorted array of key- value pairs. Consequently, I would expect rapidjson to perform better than nlohmann and jsoncons for inserts, and worse for access, which is reflected in your tests.

Of note, by sticking to sequential data structures, jsoncons and rapidjson have more compact data representations and use less memory than nlohmann.

But it seems jsoncons performs significantly poorer than nlohmann for inserts. I downloaded your project and did some experiments. I tried pre-allocating memory for the object storage with jj.reserve(), and using jj.try_emplace() rather than jj[str] =, but neither made much difference. It seems most of the time is spent moving the old key-value pairs to make room for the new, rather than in allocations. For additional confirmation, I tried inserting the keys in sorted order, and for that special case jsoncons was much faster than nlohmann.

One thing though is that your test is inserting 500 key-value pairs into an object, that seems to me to be a lot of object members. On my machine, up until about 50 keys, which seems to me to be a more typical number, jsoncons is faster than nlohmann, than falls behind.

However, let's suppose we want to match nlohmann for inserts of larger numbers (> 50) of key-value pairs. One option as you propose is to support std::map or std::unordered_map as a backing store. I haven't fully thought out how feasible that is.

As another option, note that appending all the new items to a vector and then sorting once at the end (which is what jsoncons::json::parse does) is much faster than keeping the vector in sorted order for each insert (which is what the json insert functions do.) Given that in your benchmarks (and mine) jsoncons parsing is somewhat faster than nlohmann, I think we could support comparable insert performance for larger objects by providing a special interface for anyone that needed that, perhaps something like

json j; // Original value

json_builder  builder(std::move(j));

for (int i = 0; i < 500; ++i)
    builder.insert(key,val); // appends

j = builder.get_result(); // sorts and moves result into j

Any comments welcome.

Daniel

danielaparker commented 2 years ago

After considering this request, I think I want to stick to the flat map memory layout, I don't think I want to introduce std::map or std::unordered_map (which have different iterator requirements) as alternative backing stores. Rather, I'd like to support an efficient unordered map for JSON objects following robin-hood-hashing, using the flat array memory layout. So when that's done we'll have an additional typedef ujson, which should perform well on your benchmarks. I think it will also prove to be a useful intermediate type for mapping JSON objects to C++ data structures.