Use faster copy when not overlapping

klauspost commented 5 years ago

Use the built-in copy function when the source doesn't overlap the destination.

Again benchmarks are a bit polarized based on how often this is the case, but should be a solid improvement for all non-amd64 users.

Benchmark measured on AMD64 but with -tags=noasm:

>benchstat old.txt new.txt
name        old time/op    new time/op    delta
_UFlat0-8      194µs ± 3%     130µs ± 2%   -33.14%  (p=0.000 n=10+10)
_UFlat1-8     1.62ms ± 1%    1.42ms ± 1%   -11.98%    (p=0.000 n=9+9)
_UFlat2-8     8.91µs ± 4%    8.73µs ± 1%      ~      (p=0.182 n=10+9)
_UFlat3-8      222ns ± 2%     219ns ± 6%    -1.36%   (p=0.022 n=10+9)
_UFlat4-8     28.4µs ± 2%    11.5µs ± 1%   -59.57%  (p=0.000 n=10+10)
_UFlat5-8      797µs ± 5%     536µs ± 1%   -32.77%  (p=0.000 n=10+10)
_UFlat6-8      565µs ± 1%     571µs ± 1%    +1.04%   (p=0.007 n=8+10)
_UFlat7-8      494µs ± 4%     496µs ± 3%      ~     (p=0.986 n=10+10)
_UFlat8-8     1.55ms ± 4%    1.53ms ± 3%      ~     (p=0.280 n=10+10)
_UFlat9-8     1.93ms ± 1%    1.98ms ± 3%    +2.57%  (p=0.000 n=10+10)
_UFlat10-8     186µs ± 2%     102µs ± 2%   -45.14%  (p=0.000 n=10+10)
_UFlat11-8     524µs ± 2%     510µs ± 1%    -2.56%   (p=0.000 n=10+8)

name        old speed      new speed      delta
_UFlat0-8    528MB/s ± 3%   790MB/s ± 1%   +49.54%  (p=0.000 n=10+10)
_UFlat1-8    434MB/s ± 1%   493MB/s ± 1%   +13.61%    (p=0.000 n=9+9)
_UFlat2-8   13.8GB/s ± 4%  14.1GB/s ± 2%      ~      (p=0.182 n=10+9)
_UFlat3-8    901MB/s ± 1%   912MB/s ± 6%    +1.18%    (p=0.026 n=9+9)
_UFlat4-8   3.60GB/s ± 2%  8.91GB/s ± 1%  +147.32%  (p=0.000 n=10+10)
_UFlat5-8    514MB/s ± 5%   764MB/s ± 2%   +48.59%  (p=0.000 n=10+10)
_UFlat6-8    269MB/s ± 1%   266MB/s ± 1%    -1.03%   (p=0.009 n=8+10)
_UFlat7-8    253MB/s ± 4%   252MB/s ± 3%      ~     (p=0.985 n=10+10)
_UFlat8-8    276MB/s ± 4%   279MB/s ± 3%      ~     (p=0.288 n=10+10)
_UFlat9-8    249MB/s ± 1%   243MB/s ± 3%    -2.51%  (p=0.000 n=10+10)
_UFlat10-8   637MB/s ± 2%  1162MB/s ± 2%   +82.29%  (p=0.000 n=10+10)
_UFlat11-8   352MB/s ± 2%   361MB/s ± 1%    +2.62%   (p=0.000 n=10+8)

klauspost commented 5 years ago

This is measured on AMD64 but with -tags=noasm. I will add it to the description.

klauspost commented 5 years ago

@nigeltao updated

golang / snappy

Use faster copy when not overlapping #48