Closed klauspost closed 5 years ago
Eliminate bounds check on every byte copied.
Benchmark measured on AMD64 but with -tags=noasm:
-tags=noasm
>benchstat old.txt new.txt name old time/op new time/op delta _UFlat0-8 194µs ± 3% 150µs ± 2% -22.59% (p=0.000 n=10+10) _UFlat1-8 1.62ms ± 1% 1.41ms ± 2% -12.70% (p=0.000 n=9+10) _UFlat2-8 8.91µs ± 4% 8.76µs ± 2% ~ (p=0.343 n=10+10) _UFlat3-8 222ns ± 2% 224ns ± 1% +1.00% (p=0.028 n=10+9) _UFlat4-8 28.4µs ± 2% 20.3µs ± 3% -28.45% (p=0.000 n=10+10) _UFlat5-8 797µs ± 5% 603µs ± 2% -24.34% (p=0.000 n=10+9) _UFlat6-8 565µs ± 1% 531µs ± 2% -6.16% (p=0.000 n=8+9) _UFlat7-8 494µs ± 4% 457µs ± 2% -7.61% (p=0.000 n=10+10) _UFlat8-8 1.55ms ± 4% 1.40ms ± 2% -9.48% (p=0.000 n=10+9) _UFlat9-8 1.93ms ± 1% 1.83ms ± 2% -5.44% (p=0.000 n=10+9) _UFlat10-8 186µs ± 2% 138µs ± 5% -26.04% (p=0.000 n=10+10) _UFlat11-8 524µs ± 2% 478µs ± 3% -8.68% (p=0.000 n=10+10) name old speed new speed delta _UFlat0-8 528MB/s ± 3% 682MB/s ± 2% +29.18% (p=0.000 n=10+10) _UFlat1-8 434MB/s ± 1% 497MB/s ± 2% +14.56% (p=0.000 n=9+10) _UFlat2-8 13.8GB/s ± 4% 14.1GB/s ± 2% ~ (p=0.353 n=10+10) _UFlat3-8 901MB/s ± 1% 890MB/s ± 1% -1.18% (p=0.008 n=9+9) _UFlat4-8 3.60GB/s ± 2% 5.03GB/s ± 3% +39.76% (p=0.000 n=10+10) _UFlat5-8 514MB/s ± 5% 679MB/s ± 2% +32.04% (p=0.000 n=10+9) _UFlat6-8 269MB/s ± 1% 287MB/s ± 2% +6.57% (p=0.000 n=8+9) _UFlat7-8 253MB/s ± 4% 274MB/s ± 2% +8.23% (p=0.000 n=10+10) _UFlat8-8 276MB/s ± 4% 305MB/s ± 2% +10.43% (p=0.000 n=10+9) _UFlat9-8 249MB/s ± 1% 263MB/s ± 2% +5.76% (p=0.000 n=10+9) _UFlat10-8 637MB/s ± 2% 862MB/s ± 5% +35.25% (p=0.000 n=10+10) _UFlat11-8 352MB/s ± 2% 385MB/s ± 3% +9.51% (p=0.000 n=10+10)
Code looks great. Again, though, I'd like to know what the GOARCH is that you measured, in the commit message's numbers.
@nigeltao updated
Eliminate bounds check on every byte copied.
Benchmark measured on AMD64 but with
-tags=noasm
: