dgryski / go-farm

go-farm: a pure-Go farmhash implementation
MIT License
238 stars 22 forks source link

Update Hash64 for compatibility with reference impementation #13

Closed dgryski closed 4 years ago

dgryski commented 5 years ago

The pure-C++ version of Hash64 appears to now be based on the farmhashxo set of functions rather than the farmhashna set. This change probably happened with 1.1 and we didn't notice because we were only checking for compatibility with the fingerprint code.

dgryski commented 5 years ago

Worth it? Not sure.

name          old time/op  new time/op  delta
Hash64/8-8    9.44ns ± 1%  7.60ns ± 2%  -19.44%  (p=0.000 n=9+9)
Hash64/16-8   9.42ns ± 4%  7.56ns ± 2%  -19.69%  (p=0.000 n=9+10)
Hash64/32-8   11.3ns ± 2%   9.1ns ± 1%  -19.86%  (p=0.000 n=8+8)
Hash64/40-8   15.3ns ± 1%  18.1ns ± 2%  +18.82%  (p=0.000 n=8+10)
Hash64/60-8   15.2ns ± 1%  18.5ns ± 2%  +21.66%  (p=0.000 n=8+9)
Hash64/64-8   16.2ns ±10%  18.0ns ± 1%  +11.06%  (p=0.000 n=10+8)
Hash64/72-8   31.0ns ± 2%  24.9ns ± 1%  -19.65%  (p=0.000 n=10+8)
Hash64/80-8   30.8ns ± 2%  24.9ns ± 1%  -19.02%  (p=0.000 n=10+8)
Hash64/100-8  31.6ns ± 1%  36.7ns ± 1%  +16.10%  (p=0.000 n=10+9)
Hash64/150-8  39.5ns ±14%  47.5ns ± 1%  +20.23%  (p=0.000 n=10+9)
Hash64/200-8  43.6ns ± 1%  59.1ns ± 1%  +35.57%  (p=0.000 n=10+9)
Hash64/250-8  45.4ns ± 1%  60.4ns ± 1%  +33.05%  (p=0.000 n=10+10)
xzyfer commented 4 years ago

Should this issue be closed now?

dgryski commented 4 years ago

I haven't merged this branch. Compatibility with the upstream C++ code is nice, even if we don't get a performance win. I'll rerun this and get the geomean of these numbers and try to convince myself one way or the other. Do you have an opinion?

dgryski commented 4 years ago

Geomean results:

~/go/src/github.com/dgryski/go-farm $ benchstat -geomean old.out xo.out
name          old time/op  new time/op  delta
Hash64/8-8    9.27ns ± 3%  7.27ns ± 2%  -21.50%  (p=0.000 n=20+19)
Hash64/16-8   9.19ns ± 2%  7.29ns ± 3%  -20.69%  (p=0.000 n=19+18)
Hash64/32-8   11.0ns ± 2%   8.8ns ± 2%  -20.05%  (p=0.000 n=19+19)
Hash64/40-8   15.0ns ± 1%  18.0ns ± 2%  +19.89%  (p=0.000 n=17+20)
Hash64/60-8   15.1ns ± 2%  18.5ns ± 6%  +22.67%  (p=0.000 n=20+20)
Hash64/64-8   15.3ns ± 5%  18.1ns ± 6%  +18.34%  (p=0.000 n=20+17)
Hash64/72-8   31.5ns ± 2%  24.6ns ± 2%  -21.77%  (p=0.000 n=20+18)
Hash64/80-8   31.5ns ± 5%  24.1ns ± 3%  -23.42%  (p=0.000 n=19+20)
Hash64/100-8  31.5ns ± 2%  36.4ns ± 3%  +15.75%  (p=0.000 n=20+19)
Hash64/150-8  39.1ns ± 2%  47.3ns ± 4%  +20.83%  (p=0.000 n=19+20)
Hash64/200-8  44.7ns ± 2%  60.6ns ± 2%  +35.52%  (p=0.000 n=19+19)
Hash64/250-8  44.9ns ± 4%  63.9ns ± 1%  +42.38%  (p=0.000 n=19+19)
[Geo mean]    32.7ns       21.8ns       -33.42%

I should investigate if the regressions are an implementation oddity or a fundamental slowdown.

dgryski commented 4 years ago

I think the regressions are due to Go not inlining the new lower level hash pieces. A quick hash to inline for 33-64 speeds up the computation from ~18ns to ~12ns. The longer hashes (96-bytes to 256 bytes) are passed on to the na series of hashes, while the 256+ length hashes are passed to the uo family.

I'm happier that I can speed up all hashes shorter than 100 bytes (which is probably most of them in real world use.)

Just need to decide if I'm going to break compatibility to do it though.

dgryski commented 4 years ago

Full benchmarks:


~/go/src/github.com/dgryski/go-farm $ benchstat old.out xo.out
name           old time/op  new time/op  delta
Hash64/8-8     9.12ns ± 2%  7.46ns ± 6%  -18.16%  (p=0.000 n=20+20)
Hash64/16-8    9.14ns ± 2%  7.28ns ± 3%  -20.32%  (p=0.000 n=20+19)
Hash64/32-8    11.1ns ± 5%   8.7ns ± 1%  -21.03%  (p=0.000 n=18+20)
Hash64/40-8    15.1ns ± 1%  12.5ns ± 2%  -17.30%  (p=0.000 n=18+18)
Hash64/60-8    15.1ns ± 2%  12.4ns ± 1%  -18.18%  (p=0.000 n=20+19)
Hash64/64-8    15.1ns ± 2%  12.4ns ± 1%  -18.01%  (p=0.000 n=18+19)
Hash64/72-8    31.4ns ± 2%  23.9ns ± 1%  -23.83%  (p=0.000 n=20+17)
Hash64/80-8    31.3ns ± 1%  24.2ns ± 1%  -22.50%  (p=0.000 n=19+19)
Hash64/100-8   31.4ns ± 1%  35.5ns ± 2%  +12.78%  (p=0.000 n=20+19)
Hash64/150-8   39.1ns ± 3%  47.5ns ± 2%  +21.71%  (p=0.000 n=20+19)
Hash64/200-8   44.4ns ± 2%  58.2ns ± 1%  +31.32%  (p=0.000 n=19+18)
Hash64/250-8   44.9ns ± 1%  58.4ns ± 2%  +30.15%  (p=0.000 n=20+19)
Hash64/512-8   68.8ns ± 2%  71.1ns ± 2%   +3.42%  (p=0.000 n=20+18)
Hash64/1024-8   118ns ± 2%   118ns ± 1%     ~     (p=0.386 n=20+20)
Hash64/8192-8   801ns ± 1%   803ns ± 1%     ~     (p=0.651 n=19+20)