Closed murphyatwork closed 2 years ago
Well, the VPack parser also uses SIMD instructions for string parsing, since 2015. Have you actually compared the performance of simdjson and the velocypack parser? I haven't, so I can't say if there would be a benefit from using simdjson and how large it would be. @mofeiatwork : did you perfom any benchmarks?
Thank you for reply.
I haven't perform a benchmark to compare them, but I will do it in a few days. If anything interesting found, I could submit again.
As well as I know, simdjson not only uses SIMD to parse string and numbers, but also apply a two-pass algorithm to make json structure paring more efficient. At the first stage, identifying the structure tokens like {[]},"
, which could utilize SIMD to processing characters. At the second stage, state-machine parsing based on structure tokens is carried out to parse the structure. As a result, the first stage could be executed parallel at the instruction granularity, and the second stage is pretty lightweight. According to their paper, it could deliver several times speed than RapidJSON.
@jsteemann
Looks good. Would be happy to try this, but may not be able to do so soon due to lack of time. But definitely looks interesting.
Well, I could take a try. Just wait a few days.
That would be super awesome! Thanks! :+1:
Hi, Jan. I have perform the benchmark for JSON parsers. This PR explains the parameter, environment and detailed results.
As the benchmark result, vpack JSON parser is quite competitive compared to rapidjson, but much slower than simdjson over many datasets. Maybe it's worth that port the simdjson parser into this project.
DataSet | Parser | Bytes/second | document/seccond |
---|---|---|---|
small.json | vpack | 188899782.86 | 2303655.89 |
small.json | rapidjson | 127351861.73 | 1553071.48 |
small.json | simdjson | 260920607.46 | 3181958.63 |
sample.json | vpack | 904174249.95 | 1315.18 |
sample.json | rapidjson | 1413211628.26 | 2055.61 |
sample.json | simdjson | 4950599132.35 | 7200.97 |
sampleNoWhite.json | vpack | 244545692.90 | 1419.63 |
sampleNoWhite.json | rapidjson | 459349740.48 | 2666.61 |
sampleNoWhite.json | simdjson | 4149985092.36 | 24091.40 |
commits.json | vpack | 163986219.85 | 6503.26 |
commits.json | rapidjson | 252490281.96 | 10013.10 |
commits.json | simdjson | 4078092802.08 | 161726.40 |
api-docs.json | vpack | 849059621.72 | 704.05 |
api-docs.json | rapidjson | 653520761.57 | 541.91 |
api-docs.json | simdjson | 6662158225.83 | 5524.34 |
countries.json | vpack | 244453766.46 | 215.56 |
countries.json | rapidjson | 271919886.47 | 239.78 |
countries.json | simdjson | 2603110140.11 | 2295.45 |
directory-tree.json | vpack | 184676956.61 | 620.36 |
directory-tree.json | rapidjson | 223655990.89 | 751.29 |
directory-tree.json | simdjson | 2903582799.94 | 9753.55 |
doubles-small.json | vpack | 83976076.75 | 529.13 |
doubles-small.json | rapidjson | 505519075.98 | 3185.25 |
doubles-small.json | simdjson | 4748838987.22 | 29922.24 |
doubles.json | vpack | 58083903.63 | 48.93 |
doubles.json | rapidjson | 333404631.13 | 280.87 |
doubles.json | simdjson | 4322472494.62 | 3641.32 |
file-list.json | vpack | 329627271.95 | 2178.39 |
file-list.json | rapidjson | 266579913.89 | 1761.73 |
file-list.json | simdjson | 5316260708.28 | 35133.27 |
object.json | vpack | 59581590.89 | 377.62 |
object.json | rapidjson | 385906478.91 | 2445.84 |
object.json | simdjson | 4601729866.12 | 29165.30 |
pass1.json | vpack | 250051310.75 | 173526.24 |
pass1.json | rapidjson | 485213399.57 | 336719.92 |
pass1.json | simdjson | 2061878098.89 | 1430866.13 |
pass2.json | vpack | 73247764.40 | 1408610.85 |
pass2.json | rapidjson | 67899297.43 | 1305755.72 |
pass2.json | simdjson | 138015174.90 | 2654137.98 |
pass3.json | vpack | 629831861.52 | 4255620.69 |
pass3.json | rapidjson | 227191647.91 | 1535078.70 |
pass3.json | simdjson | 390059939.88 | 2635540.13 |
random1.json | vpack | 459378717.52 | 47495.73 |
random1.json | rapidjson | 716353526.80 | 74064.67 |
random1.json | simdjson | 4812340388.09 | 497553.80 |
random2.json | vpack | 454839652.06 | 55205.69 |
random2.json | rapidjson | 735467504.66 | 89266.60 |
random2.json | simdjson | 4596730574.84 | 557923.36 |
random3.json | vpack | 437717347.45 | 5999.99 |
random3.json | rapidjson | 423421851.34 | 5804.04 |
random3.json | simdjson | 5465165253.42 | 74913.51 |
I have enforced an benchmark for the parser, which show that the performance of velocypack::Parser is good enough for realistic workload. So this issue should be closed.
VPack is a great library for serializing and storage binary JSON, along with a handy iterator, builder interfaces.
But the parser of VPack looks like just a normal recursive-descent parser, which could not take advantage of SIMD instructions of modern CPU. As I know, the simdjson parser is a few times faster than normal parser.
So, do you consider combine the simdjson with VPack builder? which could make the VPack better.
I'm considering do this work, since json parsing speed is critical in our system. I could submit a PR if you consider it's useful to this project.