Closed BurntSushi closed 6 months ago
Confirming that this PR improves the performance on my AMD system (5900x) which seems to be more impacted by the 2.6 refactor. Benchmarking the scenario created in #139:
❯ poop "./memchr-2.5.0" "./memchr-2.6.4" "./memchr-pr" -d 20000
Benchmark 1 (27 runs): ./memchr-2.5.0
measurement mean ± σ min … max outliers delta
wall_time 766ms ± 11.9ms 746ms … 792ms 0 ( 0%) 0%
peak_rss 2.25MB ± 114KB 1.97MB … 2.36MB 0 ( 0%) 0%
cpu_cycles 3.53G ± 49.6M 3.43G … 3.66G 0 ( 0%) 0%
instructions 12.9G ± 315 12.9G … 12.9G 0 ( 0%) 0%
cache_references 1.06G ± 43.5M 985M … 1.14G 0 ( 0%) 0%
cache_misses 20.7M ± 3.06M 14.8M … 27.2M 0 ( 0%) 0%
branch_misses 14.0M ± 88.3K 13.8M … 14.1M 0 ( 0%) 0%
Benchmark 2 (23 runs): ./memchr-2.6.4
measurement mean ± σ min … max outliers delta
wall_time 906ms ± 14.6ms 873ms … 931ms 0 ( 0%) 💩+ 18.3% ± 1.0%
peak_rss 2.28MB ± 98.1KB 2.16MB … 2.36MB 0 ( 0%) + 1.4% ± 2.7%
cpu_cycles 4.22G ± 53.3M 4.14G … 4.35G 0 ( 0%) 💩+ 19.5% ± 0.8%
instructions 15.7G ± 247 15.7G … 15.7G 1 ( 4%) 💩+ 21.7% ± 0.0%
cache_references 1.00G ± 67.0M 871M … 1.16G 0 ( 0%) ⚡- 5.6% ± 3.0%
cache_misses 19.8M ± 3.94M 12.7M … 26.6M 0 ( 0%) - 4.3% ± 9.7%
branch_misses 13.8M ± 76.3K 13.7M … 13.9M 2 ( 9%) - 1.3% ± 0.3%
Benchmark 3 (25 runs): ./memchr-pr
measurement mean ± σ min … max outliers delta
wall_time 807ms ± 15.2ms 779ms … 834ms 0 ( 0%) 💩+ 5.4% ± 1.0%
peak_rss 2.29MB ± 92.7KB 2.16MB … 2.36MB 0 ( 0%) + 1.9% ± 2.6%
cpu_cycles 3.72G ± 67.1M 3.61G … 3.82G 0 ( 0%) 💩+ 5.2% ± 0.9%
instructions 14.3G ± 299 14.3G … 14.3G 1 ( 4%) 💩+ 10.6% ± 0.0%
cache_references 1.05G ± 47.6M 929M … 1.16G 1 ( 4%) - 1.1% ± 2.4%
cache_misses 20.7M ± 2.71M 16.5M … 25.9M 0 ( 0%) + 0.1% ± 7.8%
branch_misses 14.2M ± 93.9K 14.0M … 14.4M 0 ( 0%) + 1.1% ± 0.4%
This PR is on crates.io in memchr 2.7.0
.
This came from @jhorstmann in #139. It simplifies
is_equal_raw
and also in turn simplifies its codegen. Since this routine gets inlined into others, this winds up being a fair improvement:See my commentary in #139 for more discussion.
Fixes #139