Closed klauspost closed 5 years ago
Use the built-in copy function when the source doesn't overlap the destination.
Again benchmarks are a bit polarized based on how often this is the case, but should be a solid improvement for all non-amd64 users.
Benchmark measured on AMD64 but with -tags=noasm:
-tags=noasm
>benchstat old.txt new.txt name old time/op new time/op delta _UFlat0-8 194µs ± 3% 130µs ± 2% -33.14% (p=0.000 n=10+10) _UFlat1-8 1.62ms ± 1% 1.42ms ± 1% -11.98% (p=0.000 n=9+9) _UFlat2-8 8.91µs ± 4% 8.73µs ± 1% ~ (p=0.182 n=10+9) _UFlat3-8 222ns ± 2% 219ns ± 6% -1.36% (p=0.022 n=10+9) _UFlat4-8 28.4µs ± 2% 11.5µs ± 1% -59.57% (p=0.000 n=10+10) _UFlat5-8 797µs ± 5% 536µs ± 1% -32.77% (p=0.000 n=10+10) _UFlat6-8 565µs ± 1% 571µs ± 1% +1.04% (p=0.007 n=8+10) _UFlat7-8 494µs ± 4% 496µs ± 3% ~ (p=0.986 n=10+10) _UFlat8-8 1.55ms ± 4% 1.53ms ± 3% ~ (p=0.280 n=10+10) _UFlat9-8 1.93ms ± 1% 1.98ms ± 3% +2.57% (p=0.000 n=10+10) _UFlat10-8 186µs ± 2% 102µs ± 2% -45.14% (p=0.000 n=10+10) _UFlat11-8 524µs ± 2% 510µs ± 1% -2.56% (p=0.000 n=10+8) name old speed new speed delta _UFlat0-8 528MB/s ± 3% 790MB/s ± 1% +49.54% (p=0.000 n=10+10) _UFlat1-8 434MB/s ± 1% 493MB/s ± 1% +13.61% (p=0.000 n=9+9) _UFlat2-8 13.8GB/s ± 4% 14.1GB/s ± 2% ~ (p=0.182 n=10+9) _UFlat3-8 901MB/s ± 1% 912MB/s ± 6% +1.18% (p=0.026 n=9+9) _UFlat4-8 3.60GB/s ± 2% 8.91GB/s ± 1% +147.32% (p=0.000 n=10+10) _UFlat5-8 514MB/s ± 5% 764MB/s ± 2% +48.59% (p=0.000 n=10+10) _UFlat6-8 269MB/s ± 1% 266MB/s ± 1% -1.03% (p=0.009 n=8+10) _UFlat7-8 253MB/s ± 4% 252MB/s ± 3% ~ (p=0.985 n=10+10) _UFlat8-8 276MB/s ± 4% 279MB/s ± 3% ~ (p=0.288 n=10+10) _UFlat9-8 249MB/s ± 1% 243MB/s ± 3% -2.51% (p=0.000 n=10+10) _UFlat10-8 637MB/s ± 2% 1162MB/s ± 2% +82.29% (p=0.000 n=10+10) _UFlat11-8 352MB/s ± 2% 361MB/s ± 1% +2.62% (p=0.000 n=10+8)
This is measured on AMD64 but with -tags=noasm. I will add it to the description.
@nigeltao updated
Use the built-in copy function when the source doesn't overlap the destination.
Again benchmarks are a bit polarized based on how often this is the case, but should be a solid improvement for all non-amd64 users.
Benchmark measured on AMD64 but with
-tags=noasm
: