chadaustin / sajson

Lightweight, extremely high-performance JSON parser for C++11
MIT License
568 stars 43 forks source link

Allocator problems #28

Open iboB opened 7 years ago

iboB commented 7 years ago

I don't think the current allocator approach is the way to go.

As far as I understand the point of the dynamic_allocator is to keep the structure buffer's size as small as possible (which is more suitable for a library called majson - Multiple Allocations JSON 😄 ). Of course I wouldn't have a problem with that if it wasn't at the cost of having multiple if-checks for allocation errors at every place it could allocate memory, needlessly slowing the execution.

I (and I think the majority of the user base) would much rather lose the possibility to keep the memory minimal, than pay the price for those checks.

I urge you to consider dropping the dynamic_allocation strategy and focusing on minimizing the allocations.

I had developed allocators internally before which only had: void* allocate(size_t size)/void deallocate(void* buf, size_t size), and they were used for the both the structure and the input buffers, thus allowing the library to always work with zero allocations with custom allocators.

I'll migrate my changes to the new version of sajson some time next week and if you're interested I'll post a link here. Here's the gist of features:

iboB commented 7 years ago

Here's the proposed change: https://github.com/Chobolabs/sajson/commit/6e58b624776bdbb0faa7a977731b1eb6157d25bb

chadaustin commented 7 years ago

Hi Boris,

Just FYI, I've got a lot going on in my personal life at the moment, which is why I haven't responded yet. I'll try to answer this week!

Cheers, Chad

iboB commented 7 years ago

No problem. There's nothing urgent :)

chadaustin commented 7 years ago

Hi Boris! I'm back from vacation and had some time to look into this.

I largely agree with your sentiments. The reason dynamic_allocation exists is because there are several use cases (specifically mobile and Emscripten) where high peak memory usage is to be reduced at all costs. In fact, I got feedback that high peak memory usage was a reason why people didn't want to use sajson. At Dropbox, we used the dynamic_allocation mode because it was only a tiny bit slower on huge documents (maybe a millisecond?) but dramatically reduced peak memory usage.

So I think we need to continue to support both. That said, I think achieving what you want is possible given new APIs in sajson:

If you want no out-of-memory checks (optimized away by the compiler), use single_allocation. If you want to use an existing large-enough buffer, use the single_allocation constructor that takes an existing buffer.

If you want zero allocations but cannot afford allocating worst-case-sized buffers, you can use the new bounded_allocation mode. It uses an existing fixed-size buffer and tries to fit the AST in it.

The refcount allocation was a silly mistake on my part - it's gone. :)

With those changes and choices, it's possible to reduce the number of allocations in sajson to zero.

I agree the allocator template complexity is annoying to deal with, but it's the only way I know of to have the compiler optimize away the allocation checks in the single_allocation case while still allowing the other usage modes.

So I think this issue can be closed... do you agree?

p.s. part of me wants to split sajson into two parts: a C++03 "raw" API that only works with buffers and a helpful C++11 wrapper API. The former would be good for FFI and situations where explicitness is desired.