dart-lang / sdk

The Dart SDK, including the VM, JS and Wasm compilers, analysis, core libraries, and more.
https://dart.dev
BSD 3-Clause "New" or "Revised" License
10.12k stars 1.57k forks source link

Speed up JSON parsing with simdjson? #51596

Open Hixie opened 1 year ago

Hixie commented 1 year ago

Would it make sense for us to replace our JSON parser with simdjson.org's parser?

mraleph commented 1 year ago

I think it is an interesting thing to consider though we should be aware that it most likely not going to be an easy and simple win if we just try to shove simdjson into the internals of the current JsonUtf8Decoder.

The current jsonDecode is doing eager parsing into Dart objects, producing a tree of String, int, double, List and Map. In fact jsonDecode spends ~70% of time doing that (materialising Dart objects from their JSON counterparts) and only 30% doing actual parsing.

simdjson itself is a lazy parser which does actual parsing as you iterate the document and access its components.

A straightforward way to use simdjson would be to make some variation of jsonDecode API which simply returns wrapper objects around simdjson::dom objects.

My main concern here would be:

For the sake of a completeness here is a very naive benchmark, I have done:

$ dart compile exe bin/json_benchmark.dart
$ bin/json_benchmark.exe
jsonDecode(from string): 119.26 (10.740) 110..130 p90: 125
jsonDecode(from bytes): 117.64 (9.360) 108..127 p90: 123
json reader -> empty processor: 69.38 (3.620) 68..73 p90: 70
json reader with manual parse: 21.92 (0.080) 21..22 p90: 22
$ clang++ -O3 -std=c++20 -o parse parse.cpp simdjson.cpp
$ ./parse
simdjson: 4.41 (0.859) 4..13 p90: 5

So simdjson can churn though JSON around 4x faster than a naive character by character parser. So there might be something here - though we need to be careful that interop costs don't eat the whole gain.

```dart import 'dart:convert'; import 'dart:io'; import 'dart:math' as math; import 'dart:typed_data'; import 'package:jsontool/jsontool.dart'; void measure(String name, void Function() cb) { const nRuns = 100; final timings = Int64List(nRuns); for (var i = 0; i < nRuns; i++) { final sw = Stopwatch()..start(); cb(); final elapsed = sw.elapsedMilliseconds; timings[i] = elapsed; } timings.sort(); final sum = timings.fold(0, (sum, v) => sum + v); final mean = sum / nRuns; final min = timings.first; final max = timings.last; final stdDev = math.sqrt(timings.fold(0.0, (sum, v) { final x = v - mean; return x * x; })); print( '$name: $mean (${stdDev.toStringAsFixed(3)}) $min..$max p90: ${timings[90]}'); } class MyProcessor extends JsonProcessor {} void main(List arguments) { // curl -o data.json https://conda.anaconda.org/conda-forge/noarch/current_repodata.json final fileAsString = File('data.json').readAsStringSync(); final fileAsBytes = File('data.json').readAsBytesSync(); measure('jsonDecode(from string)', () => jsonDecode(fileAsString)); measure('jsonDecode(from bytes)', () => utf8.decoder.fuse(json.decoder).convert(fileAsBytes)); measure('json reader -> empty processor', () => MyProcessor().processValue(JsonReader.fromUtf8(fileAsBytes))); measure('json reader with manual parse', () { int cnt = 0; final reader = JsonReader.fromUtf8(fileAsBytes); reader.expectObject(); while (reader.hasNextKey()) { final prop = reader.nextKey()!; if (prop == 'packages') { reader.expectObject(); while (reader.skipObjectEntry() && reader.hasNext()) { cnt++; } break; } else { reader.skipAnyValue(); } } if (cnt != 23004) { throw 'counted $cnt packages, but expected 23004'; } }); } ``` ```cpp // $ curl -o data.json https://conda.anaconda.org/conda-forge/noarch/current_repodata.json // $ clang++ -O3 -std=c++20 -o parse parse.cpp simdjson.cpp // $ ./parse #include #include #include #include #include "simdjson.h" using namespace simdjson; template void measure(std::string_view name, F &&f) { static constexpr int NRuns = 100; std::array timings; for (intptr_t run = 0; run < NRuns; run++) { const auto start = std::chrono::steady_clock::now(); f(); const auto end = std::chrono::steady_clock::now(); timings[run] = std::chrono::duration_cast(end - start) .count(); } std::sort(timings.begin(), timings.end()); const int64_t sum = std::accumulate(timings.begin(), timings.end(), 0); const double mean = static_cast(sum) / NRuns; const int64_t median = timings[timings.size() / 2]; const int64_t min = timings[0]; const int64_t max = timings[NRuns - 1]; const double stddev = sqrt(std::accumulate(timings.begin(), timings.end(), 0.0, [&](double sum, int64_t v) -> double { double x = static_cast(v) - mean; return x * x; }) / NRuns); std::cout << name << ": " << mean << " (" << stddev << ")" << " " << min << ".." << max << " p90: " << timings[90] << std::endl; } int main(void) { padded_string json = padded_string::load("data.json"); measure("simdjson", [&]() { ondemand::parser parser; ondemand::document data = parser.iterate(json); int count = 0; for (auto el : data["packages"].get_object()) { count++; } if (count != 23004) { std::cerr << "got " << count << " packages, but expected 23004" << std::endl; exit(1); } }); return 0; } ```

FWIW I think a more interesting thing to consider is fusing JSON deserialization and actual creation of data model from JSON. e.g. if instead of giving Map<String, dynamic> back to caller - caller gives JSON parser a scheme of objects it is trying to unpack. I think this can be a sweet spot for performance optimization.

Something like a plugin for json_serializable that uses FFI and simdjson.

(I am leaving aside the fact that simdjson does not have a chunked parsing API, which is something JsonUtf8Decoder supports).