SunDoge / simdjson-rust

Rust bindings for the simdjson project.
https://crates.io/crates/simdjson-rust
Apache License 2.0
96 stars 11 forks source link

missleading benchmarks #26

Open Licenser opened 9 months ago

Licenser commented 9 months ago

Hi,

I wanted to suggest changing the benchmark output slightly, as they are presented, it is somewhat misleading.

The way serde_json and simd-json treat the Dom is very different from how simdjson treats the Dom. Both are valid tradeoffs to make, but comparing them is not very meaningful.

Both serde_json and simd-json when presenting a Dom create a nested data structure that is modifiable and has indexed maps - a data structure on its own. That comes at the cost of allocations and filling data structures, but it's a valid tradeoff when either map are accessed frequently, or the date needs to be modified.

simdjson presents a pointer to the tape as a Dom, which means it does not perform extra allocations but does not allow mutations, and lookups are always in linear time.

Again, both are valid tradeoffs for different use cases. However, comparing them is problematic as what we compare isn't the same result.

I think the best way would be to create a third category aside of Dom, Struct called Tape, which is the fully validated JSON but not put in a nested data structure. serde_json does not provide an interface like that, simd-json does provide to_tape which provides an equivalent data structure to simdjson but without the nicer access functions (so that should be easy to implement oneself or add).

SunDoge commented 9 months ago

Thanks for your reminder. I'll convert the dom to serde_json's Value and rebenchmark it.

aminya commented 9 months ago

@SunDoge I think both should be included in the benchmarks. Converting might not be needed for all the applications.

Licenser commented 9 months ago

Ja both is definetly the best, and if not all libraries support all target formats isn't a big issue

Licenser commented 8 months ago

FWIW simd-json has now DOM like read-only access to the tape so it would be possible to include the DOM versions in the benchmark as well