golang / snappy

The Snappy compression format in the Go programming language.
BSD 3-Clause "New" or "Revised" License
1.52k stars 163 forks source link

Faster overlapping copies #49

Closed klauspost closed 5 years ago

klauspost commented 5 years ago

Eliminate bounds check on every byte copied.

Benchmark measured on AMD64 but with -tags=noasm:

>benchstat old.txt new.txt
name        old time/op    new time/op    delta
_UFlat0-8      194µs ± 3%     150µs ± 2%  -22.59%  (p=0.000 n=10+10)
_UFlat1-8     1.62ms ± 1%    1.41ms ± 2%  -12.70%   (p=0.000 n=9+10)
_UFlat2-8     8.91µs ± 4%    8.76µs ± 2%     ~     (p=0.343 n=10+10)
_UFlat3-8      222ns ± 2%     224ns ± 1%   +1.00%   (p=0.028 n=10+9)
_UFlat4-8     28.4µs ± 2%    20.3µs ± 3%  -28.45%  (p=0.000 n=10+10)
_UFlat5-8      797µs ± 5%     603µs ± 2%  -24.34%   (p=0.000 n=10+9)
_UFlat6-8      565µs ± 1%     531µs ± 2%   -6.16%    (p=0.000 n=8+9)
_UFlat7-8      494µs ± 4%     457µs ± 2%   -7.61%  (p=0.000 n=10+10)
_UFlat8-8     1.55ms ± 4%    1.40ms ± 2%   -9.48%   (p=0.000 n=10+9)
_UFlat9-8     1.93ms ± 1%    1.83ms ± 2%   -5.44%   (p=0.000 n=10+9)
_UFlat10-8     186µs ± 2%     138µs ± 5%  -26.04%  (p=0.000 n=10+10)
_UFlat11-8     524µs ± 2%     478µs ± 3%   -8.68%  (p=0.000 n=10+10)

name        old speed      new speed      delta
_UFlat0-8    528MB/s ± 3%   682MB/s ± 2%  +29.18%  (p=0.000 n=10+10)
_UFlat1-8    434MB/s ± 1%   497MB/s ± 2%  +14.56%   (p=0.000 n=9+10)
_UFlat2-8   13.8GB/s ± 4%  14.1GB/s ± 2%     ~     (p=0.353 n=10+10)
_UFlat3-8    901MB/s ± 1%   890MB/s ± 1%   -1.18%    (p=0.008 n=9+9)
_UFlat4-8   3.60GB/s ± 2%  5.03GB/s ± 3%  +39.76%  (p=0.000 n=10+10)
_UFlat5-8    514MB/s ± 5%   679MB/s ± 2%  +32.04%   (p=0.000 n=10+9)
_UFlat6-8    269MB/s ± 1%   287MB/s ± 2%   +6.57%    (p=0.000 n=8+9)
_UFlat7-8    253MB/s ± 4%   274MB/s ± 2%   +8.23%  (p=0.000 n=10+10)
_UFlat8-8    276MB/s ± 4%   305MB/s ± 2%  +10.43%   (p=0.000 n=10+9)
_UFlat9-8    249MB/s ± 1%   263MB/s ± 2%   +5.76%   (p=0.000 n=10+9)
_UFlat10-8   637MB/s ± 2%   862MB/s ± 5%  +35.25%  (p=0.000 n=10+10)
_UFlat11-8   352MB/s ± 2%   385MB/s ± 3%   +9.51%  (p=0.000 n=10+10)
nigeltao commented 5 years ago

Code looks great. Again, though, I'd like to know what the GOARCH is that you measured, in the commit message's numbers.

klauspost commented 5 years ago

@nigeltao updated