Open Geal opened 6 years ago
❤️
Apparently, there's one of the examples (the "basic" one) for which the parser fails, could you find the issue?
The parser never checked for whitespace before the first value :) woops.
Also, to match the behaviour of the other parsers, do you think it could be modified to return a &str instead of a String ?
Arguably it is more realistic to use String
(or Cow<str>
, though that may be a bit tricky) since users probably don't want to care about escapes. Easier to change one parser though and I think I can make it work for both scenarios actually.
Could you check the benchmark to see if I have not missed anything in how to use combine?
Looks fine to me! I need to improve the json parser a bit though since the only change it has seen since combine 1.0 is to refactor it to use impl Trait
as a way to show that of. I will submit a PR in a bit (also with some Cargo.toml profile changes that combine unfortunately needs due to thin-lto's immaturity ( I added the flags to all the crates though to make it fair).
Nom 4.0 release has this article with some http parsing benchmark results graphed at the end. combine
has the lowest performance, although it's close to nom
naive results, the optimized result has the top result(might be different with a PR that httparse has gets merged). Is Combine able to get similar results to the optimized Nom?
@polarathene Combine could certainly do the same kinds of optimizations as the only thing required for the the take_while_{simd,unrolled}
macros is a way to operate on a byte slice. Might take a stab at that, the macros could even be made into functions and also be shared between the implementations with some small tweaks.
@Geal I am curious about which compiler you used to produce the benchmarks? I am getting a lot of variance both from running the benchmarks on different computers as well as with different compilers.
For instance, on rust 1.24.1 nom beats combine at 318 MB/s vs 275 MB/s whereas with rust 1.26.0 combine beats nom at 317 MB/s vs 294 MB/s (Macbook).
EDIT: On another computer (Windows with Ryzen 7) nom beats combine with ~10% on both 1.24.1 and 1.26.0.
Current results below, there is probably a bit more room for improvement though.
Both nom and combine uses an unrolled take_while1
in all places where plain take_while1
were used (still uses simd on the two places where it were used) as that gives a significant speedup on my machine.
$ cargo benchcmp non_unrolled nom.txt
name non_unrolled ns/iter nom.txt ns/iter diff ns/iter diff % speedup
bigger_test 59,056 (1809 MB/s) 44,077 (2425 MB/s) -14,979 -25.36% x 1.34
httparse_example_test 311 (2260 MB/s) 216 (3254 MB/s) -95 -30.55% x 1.44
one_test 178 (1634 MB/s) 119 (2445 MB/s) -59 -33.15% x 1.50
small_test 11,724 (1823 MB/s) 8,762 (2439 MB/s) -2,962 -25.26% x 1.34
running 4 tests
test bigger_test ... bench: 48,913 ns/iter (+/- 1,322) = 2185 MB/s
test httparse_example_test ... bench: 258 ns/iter (+/- 10) = 2724 MB/s
test one_test ... bench: 152 ns/iter (+/- 39) = 1914 MB/s
test small_test ... bench: 9,722 ns/iter (+/- 331) = 2198 MB/s
test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured
running 4 tests
test bigger_test ... bench: 44,077 ns/iter (+/- 555) = 2425 MB/s
test httparse_example_test ... bench: 216 ns/iter (+/- 5) = 3254 MB/s
test one_test ... bench: 119 ns/iter (+/- 3) = 2445 MB/s
test small_test ... bench: 8,762 ns/iter (+/- 183) = 2439 MB/s
test result: ok. 0 passed; 0 failed; 0 ignored; 4 measured
$ cargo benchcmp combine.txt ../nom-optimized/nom.txt
name combine.txt ns/iter ../nom-optimized/nom.txt ns/iter diff ns/iter diff % speedup
bigger_test 48,913 (2185 MB/s) 44,077 (2425 MB/s) -4,836 -9.89% x 1.11
httparse_example_test 258 (2724 MB/s) 216 (3254 MB/s) -42 -16.28% x 1.19
one_test 152 (1914 MB/s) 119 (2445 MB/s) -33 -21.71% x 1.28
small_test 9,722 (2198 MB/s) 8,762 (2439 MB/s) -960 -9.87% x 1.11
$ cargo benchcmp combine.txt ../nom-optimized/nom.txt
name combine.txt ns/iter ../nom-optimized/nom.txt ns/iter diff ns/iter diff % speedup
bigger_test 44,488 (2402 MB/s) 44,077 (2425 MB/s) -411 -0.92% x 1.01
httparse_example_test 238 (2953 MB/s) 216 (3254 MB/s) -22 -9.24% x 1.10
one_test 145 (2006 MB/s) 119 (2445 MB/s) -26 -17.93% x 1.22
small_test 8,916 (2397 MB/s) 8,762 (2439 MB/s) -154 -1.73% x 1.02
Nice work! :D You got it pretty close!
Will it be PR'd to the benchmarks repo as combine-optimized?
:) Gonna take 1-2 more hours to try and close out the httparse_example_test
and one_test
difference, there shouldn't be any setup overhead (which would otherwise be visible in smaller tests like those, but not in large ones) so I suspect it is the input data for those which somehow triggers some bad behaviour.
Will make a PR for the benchmarks repo for a combine-optimized
after (+ some fixes for combine itself and maybe take in take_while1_simd
under a nightly feature).
Published combine 3.3 and made https://github.com/Geal/parser_benchmarks/pull/18 . Depending on the compiler/OS used combine is as fast as the nom parser now or up to 10% slower (on smaller tests only it seems).
Hello!
I spent some time cleaning up the old nom_benchmarks repository in a more general parser_benchmarks. Since we're reaching the point where the various parser libraries reach the same performance, it's getting interesting to have a good way to compare them on the same tests, and transfer techniques from one to the next.
I have added your JSON example and put it to work on the other example files: https://github.com/Geal/parser_benchmarks/tree/master/json
Apparently, there's one of the examples (the "basic" one) for which the parser fails, could you find the issue? Also, to match the behaviour of the other parsers, do you think it could be modified to return a
&str
instead of aString
?I also modified a few things in the HTTP example: https://github.com/Geal/parser_benchmarks/tree/master/http
Could you check the benchmark to see if I have not missed anything in how to use combine?