chadaustin / sajson

Lightweight, extremely high-performance JSON parser for C++11
MIT License
562 stars 42 forks source link

Documentation

sajson

sajson is an extremely high-performance, in-place, DOM-style JSON parser written in C++.

Originally, sajson meant Single Allocation JSON, but it now supports dynamic allocation too.

Features

sajson parses an input document into a contiguous AST structure. Unlike some other high-performance JSON parsers, the AST is efficiently queryable. Object lookups by key are O(lg N) and array indexing is O(1).

sajson does not require that the input buffer is null-terminated. You can use it to parse straight out of a disk mmap or network buffer, for example.

sajson is in-situ: it modifies the input string. While parsing, string values are converted to UTF-8.

(Note: sajson pays a slight performance penalty for not requiring null termination of the input string. Because sajson is in-situ, many uses cases require copying the input data anyway. Therefore, I could be convinced to add an option for requiring null termination.)

Other Features

AST Structure

The parsed AST's size is computed as such:

The values null, true, and false are encoded in tag bits and have no cost otherwise.

Allocation Modes

Single

The original sajson allocation mode allocates one word per byte of the input document. This is the fastest mode: because the AST and parse stack are guaranteed to fit, no allocation checks are required at runtime.

That is, on 32-bit platforms, sajson allocates 4 bytes per input character. On 64-bit platforms, sajson allocates 8 bytes per input character. Only use this parse mode if you can handle allocating the worst-case buffer size for your input documents.

Dynamic

The dynamic allocation mode grows the parse stack and AST buffer as needed. It's about 10-40% slower than single allocation because it needs to check for out-of-memory every time data is appended, and occasionally the buffers need to be reallocated and copied.

Bounded

The bounded allocation mode takes a fixed-size memory buffer and uses it for both the parse stack and the resulting AST. If the parse stack and AST fit in the given buffer, the parse succeeds. This allocation mode allows using sajson without the library making any allocations.

Performance

sajson's performance is excellent - it frequently benchmarks faster than RapidJSON, for example.

Implementation details are available at http://chadaustin.me/tag/sajson/.

Documentation

API documentation is available at http://chadaustin.github.io/sajson/doxygen/

Downsides / Missing Features