Open Hixie opened 1 year ago
I think it is an interesting thing to consider though we should be aware that it most likely not going to be an easy and simple win if we just try to shove simdjson
into the internals of the current JsonUtf8Decoder
.
The current jsonDecode
is doing eager parsing into Dart objects, producing a tree of String
, int
, double
, List
and Map
. In fact jsonDecode
spends ~70% of time doing that (materialising Dart objects from their JSON counterparts) and only 30% doing actual parsing.
simdjson
itself is a lazy parser which does actual parsing as you iterate the document and access its components.
A straightforward way to use simdjson
would be to make some variation of jsonDecode
API which simply returns wrapper objects around simdjson::dom
objects.
My main concern here would be:
simdjson
APIs. The compounding costs of crossing would be especially bad for anything that takes or returns strings and any small helper methods.For the sake of a completeness here is a very naive benchmark, I have done:
$ dart compile exe bin/json_benchmark.dart
$ bin/json_benchmark.exe
jsonDecode(from string): 119.26 (10.740) 110..130 p90: 125
jsonDecode(from bytes): 117.64 (9.360) 108..127 p90: 123
json reader -> empty processor: 69.38 (3.620) 68..73 p90: 70
json reader with manual parse: 21.92 (0.080) 21..22 p90: 22
$ clang++ -O3 -std=c++20 -o parse parse.cpp simdjson.cpp
$ ./parse
simdjson: 4.41 (0.859) 4..13 p90: 5
So simdjson
can churn though JSON around 4x faster than a naive character by character parser. So there might be something here - though we need to be careful that interop costs don't eat the whole gain.
FWIW I think a more interesting thing to consider is fusing JSON deserialization and actual creation of data model from JSON. e.g. if instead of giving Map<String, dynamic>
back to caller - caller gives JSON parser a scheme of objects it is trying to unpack. I think this can be a sweet spot for performance optimization.
Something like a plugin for json_serializable
that uses FFI and simdjson.
(I am leaving aside the fact that simdjson
does not have a chunked parsing API, which is something JsonUtf8Decoder
supports).
Would it make sense for us to replace our JSON parser with simdjson.org's parser?