benchmarks verification

Geal commented 6 years ago

Hello!

I spent some time cleaning up the old nom_benchmarks repository in a more general parser_benchmarks. Since we're reaching the point where the various parser libraries reach the same performance, it's getting interesting to have a good way to compare them on the same tests, and transfer techniques from one to the next.

I have added your JSON example and put it to work on the other example files: https://github.com/Geal/parser_benchmarks/tree/master/json

Apparently, there's one of the examples (the "basic" one) for which the parser fails, could you find the issue? Also, to match the behaviour of the other parsers, do you think it could be modified to return a &str instead of a String ?

I also modified a few things in the HTTP example: https://github.com/Geal/parser_benchmarks/tree/master/http

Could you check the benchmark to see if I have not missed anything in how to use combine?

Marwes commented 6 years ago

❤️

Apparently, there's one of the examples (the "basic" one) for which the parser fails, could you find the issue?

The parser never checked for whitespace before the first value :) woops.

Also, to match the behaviour of the other parsers, do you think it could be modified to return a &str instead of a String ?

Arguably it is more realistic to use String (or Cow<str>, though that may be a bit tricky) since users probably don't want to care about escapes. Easier to change one parser though and I think I can make it work for both scenarios actually.

Could you check the benchmark to see if I have not missed anything in how to use combine?

Looks fine to me! I need to improve the json parser a bit though since the only change it has seen since combine 1.0 is to refactor it to use impl Trait as a way to show that of. I will submit a PR in a bit (also with some Cargo.toml profile changes that combine unfortunately needs due to thin-lto's immaturity ( I added the flags to all the crates though to make it fair).

polarathene commented 6 years ago

Nom 4.0 release has this article with some http parsing benchmark results graphed at the end. combine has the lowest performance, although it's close to nom naive results, the optimized result has the top result(might be different with a PR that httparse has gets merged). Is Combine able to get similar results to the optimized Nom?

Marwes commented 6 years ago

@polarathene Combine could certainly do the same kinds of optimizations as the only thing required for the the take_while_{simd,unrolled} macros is a way to operate on a byte slice. Might take a stab at that, the macros could even be made into functions and also be shared between the implementations with some small tweaks.

@Geal I am curious about which compiler you used to produce the benchmarks? I am getting a lot of variance both from running the benchmarks on different computers as well as with different compilers.

For instance, on rust 1.24.1 nom beats combine at 318 MB/s vs 275 MB/s whereas with rust 1.26.0 combine beats nom at 317 MB/s vs 294 MB/s (Macbook).

EDIT: On another computer (Windows with Ryzen 7) nom beats combine with ~10% on both 1.24.1 and 1.26.0.

Marwes commented 6 years ago

Current results below, there is probably a bit more room for improvement though.

Both nom and combine uses an unrolled take_while1 in all places where plain take_while1 were used (still uses simd on the two places where it were used) as that gives a significant speedup on my machine.

Not unrolled vs unrolled (nom)

$ cargo benchcmp non_unrolled nom.txt
 name                   non_unrolled ns/iter  nom.txt ns/iter     diff ns/iter   diff %  speedup
 bigger_test            59,056 (1809 MB/s)    44,077 (2425 MB/s)       -14,979  -25.36%   x 1.34
 httparse_example_test  311 (2260 MB/s)       216 (3254 MB/s)              -95  -30.55%   x 1.44
 one_test               178 (1634 MB/s)       119 (2445 MB/s)              -59  -33.15%   x 1.50
 small_test             11,724 (1823 MB/s)    8,762 (2439 MB/s)         -2,962  -25.26%   x 1.34

Combine

running 4 tests
test bigger_test           ... bench:      48,913 ns/iter (+/- 1,322) = 2185 MB/s
test httparse_example_test ... bench:         258 ns/iter (+/- 10) = 2724 MB/s
test one_test              ... bench:         152 ns/iter (+/- 39) = 1914 MB/s
test small_test            ... bench:       9,722 ns/iter (+/- 331) = 2198 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured

Nom

running 4 tests
test bigger_test           ... bench:      44,077 ns/iter (+/- 555) = 2425 MB/s
test httparse_example_test ... bench:         216 ns/iter (+/- 5) = 3254 MB/s
test one_test              ... bench:         119 ns/iter (+/- 3) = 2445 MB/s
test small_test            ... bench:       8,762 ns/iter (+/- 183) = 2439 MB/s

test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured

$ cargo benchcmp combine.txt ../nom-optimized/nom.txt
 name                   combine.txt ns/iter  ../nom-optimized/nom.txt ns/iter  diff ns/iter   diff %  speedup
 bigger_test            48,913 (2185 MB/s)   44,077 (2425 MB/s)                      -4,836   -9.89%   x 1.11
 httparse_example_test  258 (2724 MB/s)      216 (3254 MB/s)                            -42  -16.28%   x 1.19
 one_test               152 (1914 MB/s)      119 (2445 MB/s)                            -33  -21.71%   x 1.28
 small_test             9,722 (2198 MB/s)    8,762 (2439 MB/s)                         -960   -9.87%   x 1.11

Marwes commented 6 years ago

$ cargo benchcmp combine.txt ../nom-optimized/nom.txt
 name                   combine.txt ns/iter  ../nom-optimized/nom.txt ns/iter  diff ns/iter   diff %  speedup
 bigger_test            44,488 (2402 MB/s)   44,077 (2425 MB/s)                        -411   -0.92%   x 1.01
 httparse_example_test  238 (2953 MB/s)      216 (3254 MB/s)                            -22   -9.24%   x 1.10
 one_test               145 (2006 MB/s)      119 (2445 MB/s)                            -26  -17.93%   x 1.22
 small_test             8,916 (2397 MB/s)    8,762 (2439 MB/s)                         -154   -1.73%   x 1.02

polarathene commented 6 years ago

Nice work! :D You got it pretty close!

Will it be PR'd to the benchmarks repo as combine-optimized?

Marwes commented 6 years ago

:) Gonna take 1-2 more hours to try and close out the httparse_example_test and one_test difference, there shouldn't be any setup overhead (which would otherwise be visible in smaller tests like those, but not in large ones) so I suspect it is the input data for those which somehow triggers some bad behaviour.

Will make a PR for the benchmarks repo for a combine-optimized after (+ some fixes for combine itself and maybe take in take_while1_simd under a nightly feature).

Marwes commented 6 years ago

Published combine 3.3 and made https://github.com/Geal/parser_benchmarks/pull/18 . Depending on the compiler/OS used combine is as fast as the nom parser now or up to 10% slower (on smaller tests only it seems).

Marwes / combine

benchmarks verification #152

Not unrolled vs unrolled (nom)

Combine

Nom