cespare / xxhash

A Go implementation of the 64-bit xxHash algorithm (XXH64)
MIT License
1.79k stars 123 forks source link

Remove bounds checks from purego version #68

Closed greatroar closed 1 year ago

greatroar commented 1 year ago

Benchmark results on amd64 -tags=purego:

name                                  old speed      new speed       delta
pkg:github.com/cespare/xxhash/v2 goos:linux goarch:amd64
Sum64/4B-8                             676MB/s ± 0%    820MB/s ± 0%  +21.16%  (p=0.000 n=8+9)
Sum64/16B-8                           1.83GB/s ± 1%   2.07GB/s ± 0%  +13.17%  (p=0.000 n=10+10)
Sum64/100B-8                          5.05GB/s ± 1%   5.39GB/s ± 0%   +6.77%  (p=0.000 n=10+9)
Sum64/4KB-8                           9.86GB/s ± 0%  10.03GB/s ± 0%   +1.72%  (p=0.000 n=10+9)
Sum64/10MB-8                          9.64GB/s ± 1%   9.64GB/s ± 0%     ~     (p=0.863 n=9+9)
Sum64String/4B-8                       649MB/s ± 0%    776MB/s ± 1%  +19.69%  (p=0.000 n=9+9)
Sum64String/16B-8                     1.73GB/s ± 0%   1.94GB/s ± 1%  +12.47%  (p=0.000 n=9+9)
Sum64String/100B-8                    4.93GB/s ± 1%   5.26GB/s ± 0%   +6.68%  (p=0.000 n=10+10)
Sum64String/4KB-8                     9.84GB/s ± 1%  10.04GB/s ± 0%   +2.03%  (p=0.000 n=10+9)
Sum64String/10MB-8                    9.64GB/s ± 1%   9.65GB/s ± 1%     ~     (p=0.436 n=10+10)
pkg:github.com/cespare/xxhash/xxhashbench goos:linux goarch:amd64
Hashes/xxhash,direct,bytes,n=5B-8      694MB/s ± 1%    778MB/s ± 0%  +12.11%  (p=0.000 n=9+10)
Hashes/xxhash,direct,string,n=5B-8     588MB/s ± 0%    646MB/s ± 0%   +9.97%  (p=0.000 n=10+8)
Hashes/xxhash,direct,bytes,n=100B-8   4.93GB/s ± 0%   5.18GB/s ± 0%   +5.16%  (p=0.000 n=10+9)
Hashes/xxhash,direct,string,n=100B-8  4.68GB/s ± 0%   4.93GB/s ± 1%   +5.41%  (p=0.000 n=10+10)
Hashes/xxhash,direct,bytes,n=4KB-8    9.81GB/s ± 0%  10.03GB/s ± 0%   +2.20%  (p=0.000 n=9+8)
Hashes/xxhash,direct,string,n=4KB-8   9.79GB/s ± 1%  10.01GB/s ± 0%   +2.22%  (p=0.000 n=10+10)
Hashes/xxhash,direct,bytes,n=10MB-8   9.67GB/s ± 1%   9.68GB/s ± 0%     ~     (p=0.604 n=10+9)
Hashes/xxhash,direct,string,n=10MB-8  9.63GB/s ± 1%   9.67GB/s ± 0%     ~     (p=0.050 n=9+9)

For very short inputs, the purego version is now competitive with the asm version. For anything longer than a few bytes, not quite (old=asm, new=purego):

name                                  old speed      new speed      delta
pkg:github.com/cespare/xxhash/xxhashbench goos:linux goarch:amd64
Hashes/xxhash,direct,bytes,n=5B-8      715MB/s ± 1%   778MB/s ± 0%   +8.85%  (p=0.000 n=19+10)
Hashes/xxhash,direct,string,n=5B-8     643MB/s ± 1%   646MB/s ± 0%   +0.57%  (p=0.000 n=19+8)
Hashes/xxhash,direct,bytes,n=100B-8   5.29GB/s ± 1%  5.18GB/s ± 0%   -1.99%  (p=0.000 n=20+9)
Hashes/xxhash,direct,string,n=100B-8  5.08GB/s ± 1%  4.93GB/s ± 1%   -2.89%  (p=0.000 n=18+10)
Hashes/xxhash,direct,bytes,n=4KB-8    14.5GB/s ± 1%  10.0GB/s ± 0%  -31.03%  (p=0.000 n=19+8)
Hashes/xxhash,direct,string,n=4KB-8   14.4GB/s ± 0%  10.0GB/s ± 0%  -30.31%  (p=0.000 n=18+10)
Hashes/xxhash,direct,bytes,n=10MB-8   13.4GB/s ± 1%   9.7GB/s ± 0%  -27.55%  (p=0.000 n=20+9)
Hashes/xxhash,direct,string,n=10MB-8  13.3GB/s ± 1%   9.7GB/s ± 0%  -27.31%  (p=0.000 n=19+9)
pkg:github.com/cespare/xxhash/v2 goos:linux goarch:amd64
Sum64/4B-8                             819MB/s ± 0%   820MB/s ± 0%     ~     (p=0.452 n=16+9)
Sum64/16B-8                           2.55GB/s ± 1%  2.07GB/s ± 0%  -18.83%  (p=0.000 n=20+10)
Sum64/100B-8                          5.81GB/s ± 0%  5.39GB/s ± 0%   -7.14%  (p=0.000 n=18+9)
Sum64/4KB-8                           14.6GB/s ± 0%  10.0GB/s ± 0%  -31.35%  (p=0.000 n=17+9)
Sum64/10MB-8                          13.3GB/s ± 1%   9.6GB/s ± 0%  -27.63%  (p=0.000 n=16+9)
Sum64String/4B-8                       731MB/s ± 4%   776MB/s ± 1%   +6.25%  (p=0.000 n=20+9)
Sum64String/16B-8                     2.17GB/s ± 2%  1.94GB/s ± 1%  -10.30%  (p=0.000 n=18+9)
Sum64String/100B-8                    5.27GB/s ± 2%  5.26GB/s ± 0%   -0.27%  (p=0.035 n=18+10)
Sum64String/4KB-8                     14.3GB/s ± 2%  10.0GB/s ± 0%  -29.74%  (p=0.000 n=19+9)
Sum64String/10MB-8                    13.0GB/s ± 4%   9.6GB/s ± 1%  -25.97%  (p=0.000 n=20+10)
greatroar commented 1 year ago

I did Digest too, for completeness' sake:

name                 old speed      new speed      delta
DigestBytes/4B-8      324MB/s ± 1%   354MB/s ± 0%  +9.05%  (p=0.000 n=9+10)
DigestBytes/16B-8    1.00GB/s ± 1%  1.07GB/s ± 0%  +6.76%  (p=0.000 n=9+9)
DigestBytes/100B-8   3.54GB/s ± 1%  3.67GB/s ± 1%  +3.74%  (p=0.000 n=10+10)
DigestBytes/4KB-8    9.54GB/s ± 1%  9.65GB/s ± 0%  +1.11%  (p=0.000 n=10+8)
DigestBytes/10MB-8   9.63GB/s ± 1%  9.63GB/s ± 1%    ~     (p=0.968 n=9+10)
DigestString/4B-8     311MB/s ± 0%   334MB/s ± 0%  +7.50%  (p=0.000 n=8+7)
DigestString/16B-8    973MB/s ± 1%  1036MB/s ± 0%  +6.43%  (p=0.000 n=9+10)
DigestString/100B-8  3.50GB/s ± 1%  3.60GB/s ± 0%  +2.95%  (p=0.000 n=10+9)
DigestString/4KB-8   9.51GB/s ± 4%  9.64GB/s ± 0%  +1.38%  (p=0.028 n=10+9)
DigestString/10MB-8  9.64GB/s ± 2%  9.67GB/s ± 0%    ~     (p=0.222 n=9+9)
cespare commented 1 year ago

Nice improvement! Thanks for sending it along.