Open wanweiqiangintel opened 1 year ago
If simdjson is faster than RapidJSON for our use case too, I'm OK with this.
Could you try this and share our benchmark result? https://github.com/apache/arrow/blob/main/cpp/src/arrow/json/parser_benchmark.cc
We are available to help.
Note that simdjson is used by Apache Doris and ClickHouse.
Great!
Seems that writer can still use original logic, but parser can make full use of simdjson?
Is there any merit to use both RapidJSON and simdjson?
I think that using either RapidJSON or simdjson will reduce our maintenance cost.
Agreed with @kou , we probably want to avoid depending on two different JSON libraries.
Interested people should try working on a PR.
I'm skeptical switching to simdjson would improve performance a lot, btw. Parsing is only a small part of the work necessary to convert JSON to Arrow.
Describe the enhancement requested
As the performance result mentioned in simdjson community: the simdjson library uses three-quarters less instructions than state-of-the-art parser RapidJSON. And the throughput of simdjson is much higher than that of rapidjson:
So can we replace rapidjson with simdjson to implement json parser?
Component(s)
C++