chadaustin / sajson

Lightweight, extremely high-performance JSON parser for C++11
MIT License
562 stars 42 forks source link

Optimize string values and keys #13

Closed iboB closed 7 years ago

iboB commented 8 years ago

I think the idea to copy and modify the input json string is great (and on a humorous note, perhaps the library's name should be dajson, as you make two allocations - one for the payload and one for the input text).

Anyway... copying the input works great for parse_string_slow, but I think you're not taking it far enough. Currently getting keys and sensible string values requires allocations (by creating std::string-s), which may not always be needed. For example if the user only wants to compare those to existing values, these allocations are pointless.

I have made a change here which alters the input buffer by adding zeroes at the end of strings (replacing the closing quotation marks). Thus a safe const char* may be returned, and the decision whether an allocation is to be made when using strings is left to the user.

Not a pull request yet, since it's a non-backwards-compatible change. You may have an ideas about that. Keep the old functions and add new? Major/minor version increment?

chadaustin commented 8 years ago

You know, I think I actually forgot to do the variant that mutates the input given to the mutable_string_view. So yeah, you're right, it does do two allocations today. :) Oops.

I think your change is on the right track - sajson exposes std::string in too many places. It should return its "string" data type instead. It is important that we continue to support embedded nul characters, however, so const char* isn't enough.

I'd say let's rip out all the std::strings and add a way for document parse to take a string type that's mutable so it only has to allocate once. The idea was that you could mmap a file on disk in copy-on-write mode and parse it directly without having to read the file into memory. Clearly I never got that far. :)

Cheers, and thanks for your attention!