cespare / xxhash

A Go implementation of the 64-bit xxHash algorithm (XXH64)
MIT License
1.79k stars 123 forks source link

Assembler version of Sum64String #42

Closed greatroar closed 3 years ago

greatroar commented 3 years ago

This moves the forwarding of Sum64String to Sum64 into assembler. The Go version cannot be inlined by Go 1.15.3:

./xxhash_unsafe.go:28:6: cannot inline Sum64String: function too complex: cost 91 exceeds budget 80

The assembler version makes Sum64String somewhere between a bit and a lot faster for short strings:

name                                  old time/op    new time/op    delta
Hashes/xxhash,direct,bytes,n=5B-8       5.96ns ± 1%    5.92ns ± 1%   -0.56%  (p=0.000 n=17+18)
Hashes/xxhash,direct,string,n=5B-8      8.71ns ± 1%    5.91ns ± 0%  -32.15%  (p=0.000 n=19+19)
Hashes/xxhash,direct,bytes,n=100B-8     19.2ns ± 2%    18.9ns ± 1%   -1.25%  (p=0.000 n=18+18)
Hashes/xxhash,direct,string,n=100B-8    20.3ns ± 1%    17.7ns ± 1%  -12.74%  (p=0.000 n=17+20)
Hashes/xxhash,direct,bytes,n=4KB-8       278ns ± 1%     276ns ± 1%   -0.54%  (p=0.000 n=19+16)
Hashes/xxhash,direct,string,n=4KB-8      280ns ± 2%     276ns ± 1%   -1.35%  (p=0.000 n=18+20)
Hashes/xxhash,direct,bytes,n=10MB-8      767µs ± 2%     766µs ± 1%     ~     (p=0.775 n=18+19)
Hashes/xxhash,direct,string,n=10MB-8     773µs ± 1%     775µs ± 2%     ~     (p=0.496 n=20+19)

name                                  old speed      new speed      delta
Hashes/xxhash,direct,bytes,n=5B-8      839MB/s ± 1%   844MB/s ± 1%   +0.55%  (p=0.000 n=17+18)
Hashes/xxhash,direct,string,n=5B-8     574MB/s ± 1%   846MB/s ± 0%  +47.38%  (p=0.000 n=19+19)
Hashes/xxhash,direct,bytes,n=100B-8   5.21GB/s ± 2%  5.28GB/s ± 2%   +1.21%  (p=0.000 n=18+19)
Hashes/xxhash,direct,string,n=100B-8  4.92GB/s ± 1%  5.64GB/s ± 2%  +14.57%  (p=0.000 n=17+20)
Hashes/xxhash,direct,bytes,n=4KB-8    14.4GB/s ± 1%  14.5GB/s ± 1%   +0.44%  (p=0.000 n=19+17)
Hashes/xxhash,direct,string,n=4KB-8   14.3GB/s ± 2%  14.5GB/s ± 0%   +1.58%  (p=0.000 n=18+17)
Hashes/xxhash,direct,bytes,n=10MB-8   13.0GB/s ± 2%  13.1GB/s ± 1%     ~     (p=0.775 n=18+19)
Hashes/xxhash,direct,string,n=10MB-8  12.9GB/s ± 1%  12.9GB/s ± 2%     ~     (p=0.496 n=20+19)

The implementation is a bit messy because now there need to be two _unsafe.go files. I didn't see an easy way of avoiding that.