Closed dvzubarev closed 4 years ago
Thanks for the feedback. Will investigate.
It seems the is_typed_array
specialization of ser_traits
calls ser_traits_default::decode
for each element. ser_traits_default::decode
ends up instantiating a json_decoder
for each element. Perhaps these instantiations are the bottleneck, and the json_decoder
needs to be hoisted out of the loop.
For your case where endiness and type are compatible between the CBOR typed array and the output vector, it should just be a resize and a memcpy. I should have that up on master in the next day or two.
More generally, the idea of ser_traits
is to allow decoding into a std::map
or std::vector
without requiring that the entire input be first decoded into one big basic_json
value, but you're certainly right that the current implementation is inefficient. That needs work too. But for the typed arrays, we won't need basic_json
. In a day or two.
Daniel
Can you try with the code on master?
Thanks, Daniel
Thank you, It is much faster now (from ~2s to 179 ms). Now according to Callgrind there are two costly functions: (76.45%) get_byte_string(std::error_code&)::{lambda(unsigned long, std::error_code&) (17.92%) handle_byte_string Not sure if it is worth to optimize these functions.
That part of the code can be improved, I'll look into it.
If you're still following this, I think you'll find the code that's currently on master to be about 30 percent faster. I'll leave it at that.
Hi, i was experimenting with typed arrays (decoding 13M array of floats). Here is my code: https://gist.github.com/dvzubarev/e153a437305066bb1aa1f4865bbbeb1d
I compiled with such flags: g++ -O2 -g -I ~/installed_thirdparty/jsoncons/jsoncons-0.143.1/include bench.cpp
It took 2s to decode typed array with float elements. I realize that it's not correct to compare with my naive python decoding code, but still its much faster: 0.025 s (see file bench.py).
My impl is suitable only for the specific case, and does not cover all cases like this library. But looking into jsoncons code, I noticed that typed arrays are treated element-wise like ordinary cbor arrays in
decode
function inser_traits
struct (whenis_typed_array == true
). I've added v.reserve(13100100) to this function and this change already speeds up decoding considerably. But I'm not sure that it's possible to get the array length at this stage.My question is: is it possible to process typed array not element-wisely? I think it should greatly speed up decoding.
The same question applies to endiannes converting code in
cbor_parser.hpp
. AFAIU, there always occurs an element-wise copy of the whole array incbor_parser.hpp
when converting to native endiannes. Is it possible to skip this conversion altogether when decoding LE floats on little-endian machine?