FasterXML / jackson-core

Core part of Jackson that defines Streaming API as well as basic shared abstractions
Apache License 2.0
2.27k stars 795 forks source link

Add experimental trie-based symbol table for UTF8 Json parser #109

Closed cowtowncoder closed 9 years ago

cowtowncoder commented 11 years ago

While existing symbol table implementations to reasonable job, it might be possible to get an even faster (and possibly slightly more multi-threading friendly!) implementation by using a trie that is not byte-based, but uses quadlets (int32 containing 4 bytes) that are already being used internally.

Benefits in this case are:

  1. Could incrementally match symbols without buffering quadlets (not necessarily a huge deal as most names are 1 or 2 quadlets long but...) -- the main benefit being there is no separate hash/lookup phase
  2. Would eliminate hash calculation; once again, not a huge deal in itself, but may add up.
  3. If implemented well, could ALSO be used for field-specific mapping; sort of "nextFieldName()" that takes trie for matching and not just a single name -- this could speed up data-binding considerably, as it would eliminate second lookup altogether.

The main idea is just to see if such structure is feasible; at parser level it would only have to be as fast as symbol table, if it would open up improvements for data-binding.

cowtowncoder commented 9 years ago

Given that there is the new hash-based symbol table impl in 2.6, I think I'll skip this one. Earlier attempts at using tries have suggested they can't match the performance; and profiling does not suggest that symbol table access was a major performance issue at this point either.