gonum / internal

Internal routines for the gonum project [DEPRECATED]
21 stars 9 forks source link

asm/f64: Updated axpy assembly to wide, pipelined loops. #50

Closed Kunde21 closed 7 years ago

Kunde21 commented 7 years ago

Testing code was cleaned up and updated to use sigil values in the guards instead of NaN. Helper functions were consolidated. Started to transition benchmarks to the cleaner go1.7 sub-benchmark construct.

The larger loop does slow down the small cases, len(vec) <= 10, compared to the old asm while speeding up the everything up to the memory bottleneck.

Benchmarks against old code
Old AsmNew Asm
time/opdelta
AxpyUnitary/AxpyUnitary-1-123.64ns ±13%4.10ns ±11%+12.73%(p=0.003 n=10+10)
AxpyUnitary/AxpyUnitary-2-123.32ns ± 6%4.09ns ± 5%+23.06%(p=0.000 n=9+10)
AxpyUnitary/AxpyUnitary-3-124.01ns ±11%4.83ns ± 4%+20.39%(p=0.000 n=9+9)
AxpyUnitary/AxpyUnitary-4-124.28ns ± 1%5.53ns ± 6%+29.24%(p=0.000 n=9+9)
AxpyUnitary/AxpyUnitary-5-125.22ns ±11%5.50ns ±10%+5.46%(p=0.048 n=10+10)
AxpyUnitary/AxpyUnitary-10-126.46ns ± 3%6.68ns ± 1%+3.37%(p=0.000 n=8+8)
AxpyUnitary/AxpyUnitary-100-1232.5ns ±12%24.4ns ± 8%−24.85%(p=0.000 n=10+10)
AxpyUnitary/AxpyUnitary-500-12122ns ±10%109ns ±16%−10.73%(p=0.002 n=10+10)
AxpyUnitary/AxpyUnitary-1000-12243ns ±11%197ns ± 1%−18.80%(p=0.000 n=10+8)
AxpyUnitary/AxpyUnitary-5000-121.53µs ± 1%1.44µs ± 3%−5.78%(p=0.000 n=9+9)
AxpyUnitary/AxpyUnitary-10000-123.08µs ± 3%3.19µs ±17%~(p=1.000 n=9+10)
AxpyUnitary/AxpyUnitary-50000-1225.3µs ± 9%26.2µs ±11%~(p=0.796 n=10+10)
AxpyUnitaryTo/AxpyUnitaryTo-1-123.85ns ± 1%4.83ns ±14%+25.59%(p=0.000 n=9+10)
AxpyUnitaryTo/AxpyUnitaryTo-2-124.10ns ±10%4.85ns ± 6%+18.11%(p=0.000 n=10+10)
AxpyUnitaryTo/AxpyUnitaryTo-3-124.65ns ±12%5.27ns ± 5%+13.33%(p=0.000 n=10+9)
AxpyUnitaryTo/AxpyUnitaryTo-4-124.45ns ± 0%5.29ns ± 3%+18.89%(p=0.000 n=8+9)
AxpyUnitaryTo/AxpyUnitaryTo-5-125.34ns ± 0%5.90ns ± 2%+10.36%(p=0.000 n=9+10)
AxpyUnitaryTo/AxpyUnitaryTo-10-127.31ns ±10%7.25ns ± 0%~(p=0.501 n=10+8)
AxpyUnitaryTo/AxpyUnitaryTo-100-1231.5ns ± 1%23.9ns ± 0%−24.15%(p=0.000 n=9+9)
AxpyUnitaryTo/AxpyUnitaryTo-500-12125ns ±17%109ns ±21%−12.50%(p=0.001 n=10+10)
AxpyUnitaryTo/AxpyUnitaryTo-1000-12237ns ± 8%208ns ± 8%−12.59%(p=0.000 n=10+10)
AxpyUnitaryTo/AxpyUnitaryTo-5000-121.64µs ± 1%1.59µs ± 1%−2.61%(p=0.000 n=8+8)
AxpyUnitaryTo/AxpyUnitaryTo-10000-123.98µs ±22%3.72µs ±11%~(p=0.182 n=10+9)
AxpyUnitaryTo/AxpyUnitaryTo-50000-1231.1µs ± 0%31.3µs ± 1%~(p=0.645 n=8+8)
AxpyInc/AxpyInc-1-inc(1)-125.18ns ± 7%5.40ns ±12%+4.39%(p=0.007 n=9+10)
AxpyInc/AxpyInc-2-inc(1)-125.89ns ± 0%5.77ns ± 3%−2.14%(p=0.043 n=8+9)
AxpyInc/AxpyInc-2-inc(2)-125.89ns ± 1%5.81ns ± 6%~(p=0.262 n=8+9)
AxpyInc/AxpyInc-2-inc(4)-125.97ns ± 5%5.77ns ± 7%−3.47%(p=0.012 n=10+9)
AxpyInc/AxpyInc-2-inc(10)-126.10ns ±13%5.71ns ± 2%−6.34%(p=0.000 n=10+8)
AxpyInc/AxpyInc-3-inc(1)-126.62ns ± 0%6.76ns ±13%~(p=0.164 n=8+10)
AxpyInc/AxpyInc-3-inc(2)-126.62ns ± 1%6.39ns ± 0%−3.44%(p=0.000 n=9+8)
AxpyInc/AxpyInc-3-inc(4)-126.66ns ± 2%6.78ns ±13%~(p=0.496 n=9+10)
AxpyInc/AxpyInc-3-inc(10)-126.61ns ± 1%6.43ns ± 3%−2.69%(p=0.004 n=8+9)
AxpyInc/AxpyInc-4-inc(1)-128.22ns ± 5%6.98ns ± 2%−15.11%(p=0.000 n=8+9)
AxpyInc/AxpyInc-4-inc(2)-128.18ns ± 1%6.95ns ± 1%−15.06%(p=0.000 n=8+8)
AxpyInc/AxpyInc-4-inc(4)-128.22ns ± 1%6.95ns ± 1%−15.54%(p=0.000 n=8+9)
AxpyInc/AxpyInc-4-inc(10)-128.27ns ± 0%6.97ns ± 1%−15.76%(p=0.000 n=8+9)
AxpyInc/AxpyInc-5-inc(1)-129.26ns ± 1%8.38ns ± 9%−9.54%(p=0.000 n=8+9)
AxpyInc/AxpyInc-5-inc(2)-129.55ns ± 6%8.15ns ± 1%−14.68%(p=0.000 n=10+9)
AxpyInc/AxpyInc-5-inc(4)-129.78ns ±14%8.72ns ±15%−10.83%(p=0.019 n=10+10)
AxpyInc/AxpyInc-5-inc(10)-129.34ns ± 5%8.66ns ±19%−7.30%(p=0.025 n=8+10)
AxpyInc/AxpyInc-10-inc(1)-1212.8ns ± 2%11.6ns ±11%−9.04%(p=0.008 n=8+10)
AxpyInc/AxpyInc-10-inc(2)-1212.8ns ± 2%11.3ns ±12%−11.46%(p=0.000 n=8+9)
AxpyInc/AxpyInc-10-inc(4)-1212.8ns ± 3%11.6ns ±11%−9.37%(p=0.000 n=9+10)
AxpyInc/AxpyInc-10-inc(10)-1212.7ns ± 1%10.9ns ± 1%−14.08%(p=0.000 n=10+8)
AxpyInc/AxpyInc-500-inc(1)-12445ns ± 0%196ns ± 8%−55.86%(p=0.000 n=6+9)
AxpyInc/AxpyInc-500-inc(2)-12445ns ± 0%193ns ± 1%−56.60%(p=0.000 n=9+8)
AxpyInc/AxpyInc-500-inc(4)-12449ns ± 1%209ns ±14%−53.42%(p=0.000 n=9+10)
AxpyInc/AxpyInc-500-inc(10)-121.03µs ± 0%1.02µs ± 0%~(p=0.164 n=10+8)
AxpyInc/AxpyInc-1000-inc(1)-12883ns ± 0%398ns ± 8%−54.96%(p=0.000 n=9+10)
AxpyInc/AxpyInc-1000-inc(2)-12915ns ±12%398ns ± 8%−56.54%(p=0.000 n=9+9)
AxpyInc/AxpyInc-1000-inc(4)-121.12µs ±17%1.03µs ± 0%−8.00%(p=0.000 n=10+8)
AxpyInc/AxpyInc-1000-inc(10)-122.15µs ± 8%2.12µs ± 5%~(p=0.644 n=10+10)
AxpyInc/AxpyInc-10000-inc(1)-128.82µs ± 1%4.94µs ±14%−43.94%(p=0.000 n=8+10)
AxpyInc/AxpyInc-10000-inc(2)-1210.1µs ± 2%8.9µs ±29%~(p=0.053 n=9+10)
AxpyInc/AxpyInc-10000-inc(4)-1219.2µs ± 8%19.1µs ±18%~(p=0.086 n=10+10)
AxpyInc/AxpyInc-10000-inc(10)-1239.1µs ± 6%37.7µs ± 4%−3.60%(p=0.000 n=9+9)
AxpyInc/AxpyInc-10000-inc(-1)-129.23µs ±13%5.17µs ±21%−44.04%(p=0.000 n=10+10)
AxpyInc/AxpyInc-10000-inc(-2)-1210.5µs ±11%8.7µs ±18%−17.15%(p=0.006 n=9+10)
AxpyInc/AxpyInc-10000-inc(-4)-1218.6µs ± 1%19.2µs ± 9%~(p=1.000 n=8+10)
AxpyInc/AxpyInc-10000-inc(-10)-1239.0µs ± 8%38.5µs ±11%~(p=0.079 n=10+9)
AxpyIncTo/AxpyIncTo-1-inc(1)-126.62ns ± 5%6.44ns ± 0%−2.72%(p=0.003 n=9+9)
AxpyIncTo/AxpyIncTo-2-inc(1)-127.58ns ± 1%6.75ns ± 2%−10.98%(p=0.000 n=7+10)
AxpyIncTo/AxpyIncTo-2-inc(2)-127.86ns ±15%6.73ns ± 1%−14.40%(p=0.000 n=10+10)
AxpyIncTo/AxpyIncTo-2-inc(4)-127.63ns ± 8%6.70ns ± 0%−12.22%(p=0.000 n=9+9)
AxpyIncTo/AxpyIncTo-2-inc(10)-127.59ns ± 1%6.69ns ± 0%−11.82%(p=0.000 n=6+9)
AxpyIncTo/AxpyIncTo-3-inc(1)-128.23ns ± 8%7.17ns ± 1%−12.91%(p=0.000 n=8+8)
AxpyIncTo/AxpyIncTo-3-inc(2)-128.31ns ± 1%7.16ns ± 1%−13.87%(p=0.000 n=6+8)
AxpyIncTo/AxpyIncTo-3-inc(4)-128.29ns ± 0%7.82ns ±29%~(p=0.210 n=6+10)
AxpyIncTo/AxpyIncTo-3-inc(10)-128.38ns ± 4%7.34ns ±13%−12.46%(p=0.001 n=7+9)
AxpyIncTo/AxpyIncTo-4-inc(1)-1210.1ns ± 0%8.1ns ± 2%−20.16%(p=0.001 n=6+8)
AxpyIncTo/AxpyIncTo-4-inc(2)-1210.3ns ±16%8.0ns ± 1%−21.92%(p=0.000 n=9+9)
AxpyIncTo/AxpyIncTo-4-inc(4)-1210.1ns ± 8%8.0ns ± 1%−20.21%(p=0.000 n=9+8)
AxpyIncTo/AxpyIncTo-4-inc(10)-1210.1ns ± 0%8.2ns ± 9%−18.84%(p=0.000 n=6+9)
AxpyIncTo/AxpyIncTo-5-inc(1)-1210.5ns ± 3%9.5ns ± 9%−10.07%(p=0.000 n=8+10)
AxpyIncTo/AxpyIncTo-5-inc(2)-1210.8ns ± 5%9.2ns ± 1%−15.17%(p=0.000 n=9+9)
AxpyIncTo/AxpyIncTo-5-inc(4)-1210.5ns ± 3%10.0ns ±27%~(p=0.162 n=8+10)
AxpyIncTo/AxpyIncTo-5-inc(10)-1210.6ns ± 1%9.2ns ± 2%−13.57%(p=0.000 n=7+8)
AxpyIncTo/AxpyIncTo-10-inc(1)-1215.2ns ± 7%12.3ns ± 3%−19.27%(p=0.000 n=9+8)
AxpyIncTo/AxpyIncTo-10-inc(2)-1215.1ns ± 4%12.3ns ± 3%−18.68%(p=0.000 n=10+9)
AxpyIncTo/AxpyIncTo-10-inc(4)-1215.0ns ± 3%12.1ns ± 1%−19.03%(p=0.000 n=8+8)
AxpyIncTo/AxpyIncTo-10-inc(10)-1215.3ns ± 9%12.1ns ± 0%−21.07%(p=0.000 n=10+8)
AxpyIncTo/AxpyIncTo-500-inc(1)-12447ns ± 0%203ns ± 8%−54.64%(p=0.000 n=8+10)
AxpyIncTo/AxpyIncTo-500-inc(2)-12482ns ±17%194ns ± 2%−59.77%(p=0.000 n=10+8)
AxpyIncTo/AxpyIncTo-500-inc(4)-12624ns ± 1%617ns ± 1%−1.17%(p=0.000 n=9+8)
AxpyIncTo/AxpyIncTo-500-inc(10)-121.25µs ± 2%1.25µs ± 3%~(p=0.535 n=9+10)
AxpyIncTo/AxpyIncTo-1000-inc(1)-12917ns ±17%382ns ± 3%−58.35%(p=0.000 n=9+8)
AxpyIncTo/AxpyIncTo-1000-inc(2)-12918ns ± 1%644ns ± 3%−29.83%(p=0.000 n=8+9)
AxpyIncTo/AxpyIncTo-1000-inc(4)-121.25µs ± 0%1.25µs ± 1%−0.43%(p=0.007 n=7+9)
AxpyIncTo/AxpyIncTo-1000-inc(10)-122.94µs ±24%3.07µs ±18%~(p=0.436 n=10+10)
AxpyIncTo/AxpyIncTo-10000-inc(1)-128.86µs ± 1%5.55µs ±17%−37.29%(p=0.000 n=9+10)
AxpyIncTo/AxpyIncTo-10000-inc(2)-1213.1µs ± 5%12.7µs ±10%~(p=0.143 n=10+10)
AxpyIncTo/AxpyIncTo-10000-inc(4)-1224.6µs ± 1%25.2µs ±10%~(p=0.853 n=10+10)
AxpyIncTo/AxpyIncTo-10000-inc(10)-1253.6µs ± 7%53.7µs ± 7%~(p=0.853 n=10+10)
AxpyIncTo/AxpyIncTo-10000-inc(-1)-128.86µs ± 1%5.51µs ± 7%−37.79%(p=0.000 n=8+10)
AxpyIncTo/AxpyIncTo-10000-inc(-2)-1213.4µs ± 9%12.9µs ±10%~(p=0.218 n=10+10)
AxpyIncTo/AxpyIncTo-10000-inc(-4)-1224.6µs ± 0%24.9µs ± 7%~(p=0.743 n=8+9)
AxpyIncTo/AxpyIncTo-10000-inc(-10)-1252.5µs ± 0%53.0µs ± 4%~(p=0.866 n=8+9)
 
Benchmarks vs for loop
for loopNew Asm
time/opdelta
AxpyUnitary/AxpyUnitary-1-123.99ns ±16%4.10ns ±11%~(p=0.342 n=10+10)
AxpyUnitary/AxpyUnitary-2-124.64ns ± 3%4.09ns ± 5%−11.86%(p=0.000 n=8+10)
AxpyUnitary/AxpyUnitary-3-125.58ns ± 4%4.83ns ± 4%−13.51%(p=0.000 n=10+9)
AxpyUnitary/AxpyUnitary-4-126.23ns ± 3%5.53ns ± 6%−11.28%(p=0.000 n=9+9)
AxpyUnitary/AxpyUnitary-5-126.74ns ± 0%5.50ns ±10%−18.41%(p=0.000 n=8+10)
AxpyUnitary/AxpyUnitary-10-1210.1ns ± 5%6.7ns ± 1%−34.20%(p=0.000 n=8+8)
AxpyUnitary/AxpyUnitary-100-1275.1ns ±13%24.4ns ± 8%−67.47%(p=0.000 n=10+10)
AxpyUnitary/AxpyUnitary-500-12310ns ± 2%109ns ±16%−64.85%(p=0.000 n=9+10)
AxpyUnitary/AxpyUnitary-1000-12602ns ± 1%197ns ± 1%−67.28%(p=0.000 n=9+8)
AxpyUnitary/AxpyUnitary-5000-123.04µs ± 4%1.44µs ± 3%−52.67%(p=0.000 n=8+9)
AxpyUnitary/AxpyUnitary-10000-126.06µs ± 3%3.19µs ±17%−47.34%(p=0.000 n=8+10)
AxpyUnitary/AxpyUnitary-50000-1233.6µs ± 1%26.2µs ±11%−22.01%(p=0.000 n=8+10)
AxpyUnitaryTo/AxpyUnitaryTo-1-124.33ns ± 1%4.83ns ±14%+11.70%(p=0.000 n=8+10)
AxpyUnitaryTo/AxpyUnitaryTo-2-125.38ns ±11%4.85ns ± 6%−10.01%(p=0.000 n=10+10)
AxpyUnitaryTo/AxpyUnitaryTo-3-126.10ns ± 2%5.27ns ± 5%−13.60%(p=0.000 n=9+9)
AxpyUnitaryTo/AxpyUnitaryTo-4-126.82ns ± 2%5.29ns ± 3%−22.44%(p=0.000 n=8+9)
AxpyUnitaryTo/AxpyUnitaryTo-5-128.11ns ±12%5.90ns ± 2%−27.29%(p=0.000 n=9+10)
AxpyUnitaryTo/AxpyUnitaryTo-10-1212.9ns ± 1%7.3ns ± 0%−43.63%(p=0.000 n=9+8)
AxpyUnitaryTo/AxpyUnitaryTo-100-1290.0ns ±28%23.9ns ± 0%−73.43%(p=0.000 n=10+9)
AxpyUnitaryTo/AxpyUnitaryTo-500-12339ns ± 4%109ns ±21%−67.80%(p=0.000 n=9+10)
AxpyUnitaryTo/AxpyUnitaryTo-1000-12652ns ± 3%208ns ± 8%−68.14%(p=0.000 n=9+10)
AxpyUnitaryTo/AxpyUnitaryTo-5000-123.27µs ± 7%1.59µs ± 1%−51.26%(p=0.000 n=8+8)
AxpyUnitaryTo/AxpyUnitaryTo-10000-126.65µs ± 2%3.72µs ±11%−44.04%(p=0.000 n=9+9)
AxpyUnitaryTo/AxpyUnitaryTo-50000-1239.2µs ± 5%31.3µs ± 1%−20.10%(p=0.000 n=9+8)
AxpyInc/AxpyInc-1-inc(1)-126.01ns ±20%5.40ns ±12%−10.11%(p=0.008 n=10+10)
AxpyInc/AxpyInc-2-inc(1)-126.27ns ± 3%5.77ns ± 3%−8.06%(p=0.000 n=9+9)
AxpyInc/AxpyInc-2-inc(2)-126.15ns ± 2%5.81ns ± 6%−5.55%(p=0.002 n=8+9)
AxpyInc/AxpyInc-2-inc(4)-126.15ns ± 2%5.77ns ± 7%−6.23%(p=0.000 n=8+9)
AxpyInc/AxpyInc-2-inc(10)-126.17ns ± 2%5.71ns ± 2%−7.36%(p=0.000 n=9+8)
AxpyInc/AxpyInc-3-inc(1)-128.02ns ±16%6.76ns ±13%−15.72%(p=0.000 n=10+10)
AxpyInc/AxpyInc-3-inc(2)-127.74ns ± 5%6.39ns ± 0%−17.39%(p=0.000 n=9+8)
AxpyInc/AxpyInc-3-inc(4)-127.58ns ± 2%6.78ns ±13%−10.54%(p=0.003 n=8+10)
AxpyInc/AxpyInc-3-inc(10)-127.59ns ± 1%6.43ns ± 3%−15.31%(p=0.000 n=8+9)
AxpyInc/AxpyInc-4-inc(1)-129.07ns ±14%6.98ns ± 2%−23.05%(p=0.000 n=9+9)
AxpyInc/AxpyInc-4-inc(2)-129.35ns ±10%6.95ns ± 1%−25.71%(p=0.000 n=9+8)
AxpyInc/AxpyInc-4-inc(4)-129.51ns ±21%6.95ns ± 1%−26.95%(p=0.000 n=10+9)
AxpyInc/AxpyInc-4-inc(10)-128.85ns ± 1%6.97ns ± 1%−21.28%(p=0.000 n=9+9)
AxpyInc/AxpyInc-5-inc(1)-1210.8ns ±17%8.4ns ± 9%−22.12%(p=0.000 n=10+9)
AxpyInc/AxpyInc-5-inc(2)-1210.0ns ± 1%8.1ns ± 1%−18.68%(p=0.000 n=8+9)
AxpyInc/AxpyInc-5-inc(4)-1210.9ns ±14%8.7ns ±15%−19.68%(p=0.000 n=10+10)
AxpyInc/AxpyInc-5-inc(10)-1210.2ns ±15%8.7ns ±19%−15.37%(p=0.001 n=9+10)
AxpyInc/AxpyInc-10-inc(1)-1213.4ns ± 1%11.6ns ±11%−13.12%(p=0.000 n=8+10)
AxpyInc/AxpyInc-10-inc(2)-1213.3ns ± 0%11.3ns ±12%−14.87%(p=0.000 n=8+9)
AxpyInc/AxpyInc-10-inc(4)-1213.3ns ± 1%11.6ns ±11%−13.26%(p=0.000 n=8+10)
AxpyInc/AxpyInc-10-inc(10)-1214.3ns ±12%10.9ns ± 1%−23.25%(p=0.000 n=10+8)
AxpyInc/AxpyInc-500-inc(1)-12398ns ± 7%196ns ± 8%−50.60%(p=0.000 n=10+9)
AxpyInc/AxpyInc-500-inc(2)-12400ns ± 8%193ns ± 1%−51.75%(p=0.000 n=10+8)
AxpyInc/AxpyInc-500-inc(4)-12398ns ± 8%209ns ±14%−47.37%(p=0.000 n=9+10)
AxpyInc/AxpyInc-500-inc(10)-121.03µs ± 0%1.02µs ± 0%~(p=0.582 n=8+8)
AxpyInc/AxpyInc-1000-inc(1)-12804ns ±13%398ns ± 8%−50.49%(p=0.000 n=10+10)
AxpyInc/AxpyInc-1000-inc(2)-12771ns ± 4%398ns ± 8%−48.42%(p=0.000 n=8+9)
AxpyInc/AxpyInc-1000-inc(4)-121.14µs ±11%1.03µs ± 0%−9.44%(p=0.000 n=10+8)
AxpyInc/AxpyInc-1000-inc(10)-122.10µs ± 4%2.12µs ± 5%~(p=0.898 n=10+10)
AxpyInc/AxpyInc-10000-inc(1)-127.61µs ± 1%4.94µs ±14%−35.05%(p=0.000 n=9+10)
AxpyInc/AxpyInc-10000-inc(2)-129.72µs ±19%8.85µs ±29%~(p=0.089 n=10+10)
AxpyInc/AxpyInc-10000-inc(4)-1219.1µs ± 8%19.1µs ±18%~(p=0.102 n=10+10)
AxpyInc/AxpyInc-10000-inc(10)-1239.1µs ± 6%37.7µs ± 4%−3.50%(p=0.002 n=10+9)
AxpyInc/AxpyInc-10000-inc(-1)-127.63µs ± 1%5.17µs ±21%−32.27%(p=0.000 n=9+10)
AxpyInc/AxpyInc-10000-inc(-2)-1210.1µs ±21%8.7µs ±18%~(p=0.052 n=10+10)
AxpyInc/AxpyInc-10000-inc(-4)-1219.1µs ± 7%19.2µs ± 9%~(p=0.529 n=10+10)
AxpyInc/AxpyInc-10000-inc(-10)-1238.0µs ± 0%38.5µs ±11%~(p=0.321 n=8+9)
AxpyIncTo/AxpyIncTo-1-inc(1)-127.30ns ± 3%6.44ns ± 0%−11.74%(p=0.000 n=8+9)
AxpyIncTo/AxpyIncTo-2-inc(1)-128.13ns ± 1%6.75ns ± 2%−16.96%(p=0.000 n=9+10)
AxpyIncTo/AxpyIncTo-2-inc(2)-128.73ns ±18%6.73ns ± 1%−22.99%(p=0.000 n=10+10)
AxpyIncTo/AxpyIncTo-2-inc(4)-128.79ns ±18%6.70ns ± 0%−23.83%(p=0.000 n=10+9)
AxpyIncTo/AxpyIncTo-2-inc(10)-128.09ns ± 2%6.69ns ± 0%−17.34%(p=0.000 n=7+9)
AxpyIncTo/AxpyIncTo-3-inc(1)-129.85ns ±15%7.17ns ± 1%−27.20%(p=0.000 n=9+8)
AxpyIncTo/AxpyIncTo-3-inc(2)-129.66ns ±10%7.16ns ± 1%−25.91%(p=0.000 n=10+8)
AxpyIncTo/AxpyIncTo-3-inc(4)-129.80ns ±16%7.82ns ±29%−20.19%(p=0.000 n=10+10)
AxpyIncTo/AxpyIncTo-3-inc(10)-129.67ns ±12%7.34ns ±13%−24.07%(p=0.000 n=9+9)
AxpyIncTo/AxpyIncTo-4-inc(1)-1210.6ns ± 0%8.1ns ± 2%−24.28%(p=0.000 n=8+8)
AxpyIncTo/AxpyIncTo-4-inc(2)-1210.6ns ± 1%8.0ns ± 1%−24.33%(p=0.000 n=7+9)
AxpyIncTo/AxpyIncTo-4-inc(4)-1210.8ns ±10%8.0ns ± 1%−25.59%(p=0.000 n=9+8)
AxpyIncTo/AxpyIncTo-4-inc(10)-1210.6ns ± 1%8.2ns ± 9%−22.83%(p=0.000 n=9+9)
AxpyIncTo/AxpyIncTo-5-inc(1)-1211.8ns ± 5%9.5ns ± 9%−19.54%(p=0.000 n=9+10)
AxpyIncTo/AxpyIncTo-5-inc(2)-1211.6ns ± 3%9.2ns ± 1%−20.63%(p=0.000 n=9+9)
AxpyIncTo/AxpyIncTo-5-inc(4)-1211.6ns ± 3%10.0ns ±27%−13.99%(p=0.011 n=8+10)
AxpyIncTo/AxpyIncTo-5-inc(10)-1212.2ns ±13%9.2ns ± 2%−24.48%(p=0.000 n=10+8)
AxpyIncTo/AxpyIncTo-10-inc(1)-1215.7ns ± 5%12.3ns ± 3%−21.78%(p=0.000 n=9+8)
AxpyIncTo/AxpyIncTo-10-inc(2)-1216.1ns ±12%12.3ns ± 3%−23.37%(p=0.000 n=9+9)
AxpyIncTo/AxpyIncTo-10-inc(4)-1216.0ns ±14%12.1ns ± 1%−24.38%(p=0.000 n=9+8)
AxpyIncTo/AxpyIncTo-10-inc(10)-1215.6ns ± 4%12.1ns ± 0%−22.63%(p=0.000 n=10+8)
AxpyIncTo/AxpyIncTo-500-inc(1)-12482ns ±21%203ns ± 8%−57.99%(p=0.000 n=10+10)
AxpyIncTo/AxpyIncTo-500-inc(2)-12457ns ± 2%194ns ± 2%−57.58%(p=0.000 n=9+8)
AxpyIncTo/AxpyIncTo-500-inc(4)-12626ns ± 2%617ns ± 1%−1.42%(p=0.000 n=9+8)
AxpyIncTo/AxpyIncTo-500-inc(10)-121.30µs ±11%1.25µs ± 3%~(p=0.897 n=10+10)
AxpyIncTo/AxpyIncTo-1000-inc(1)-12896ns ± 1%382ns ± 3%−57.41%(p=0.000 n=8+8)
AxpyIncTo/AxpyIncTo-1000-inc(2)-12940ns ± 2%644ns ± 3%−31.49%(p=0.000 n=9+9)
AxpyIncTo/AxpyIncTo-1000-inc(4)-121.25µs ± 1%1.25µs ± 1%~(p=0.979 n=8+9)
AxpyIncTo/AxpyIncTo-1000-inc(10)-122.86µs ± 7%3.07µs ±18%~(p=0.315 n=10+10)
AxpyIncTo/AxpyIncTo-10000-inc(1)-129.34µs ±16%5.55µs ±17%−40.54%(p=0.000 n=10+10)
AxpyIncTo/AxpyIncTo-10000-inc(2)-1214.2µs ±17%12.7µs ±10%−10.37%(p=0.002 n=10+10)
AxpyIncTo/AxpyIncTo-10000-inc(4)-1224.8µs ± 3%25.2µs ±10%~(p=0.549 n=9+10)
AxpyIncTo/AxpyIncTo-10000-inc(10)-1254.0µs ± 7%53.7µs ± 7%~(p=0.165 n=10+10)
AxpyIncTo/AxpyIncTo-10000-inc(-1)-129.77µs ±13%5.51µs ± 7%−43.56%(p=0.000 n=10+10)
AxpyIncTo/AxpyIncTo-10000-inc(-2)-1213.6µs ± 1%12.9µs ±10%−5.27%(p=0.010 n=7+10)
AxpyIncTo/AxpyIncTo-10000-inc(-4)-1224.7µs ± 1%24.9µs ± 7%~(p=0.387 n=9+9)
AxpyIncTo/AxpyIncTo-10000-inc(-10)-1252.5µs ± 0%53.0µs ± 4%~(p=1.000 n=8+9)
 
kortschak commented 7 years ago

This won't merge. Would you fix that please.

Kunde21 commented 7 years ago

@kortschak I tried to trigger the errors, but didn't get any. We can revert if you had other things to address.

kortschak commented 7 years ago

This breaks "gonum/blas/native". I'm going to revert it. Please don't merge until there is an approval.