Closed kortschak closed 7 years ago
There's a lot of noise.
name old time/op new time/op delta
CholeskySmall-4 2.40µs ± 4% 2.50µs ± 3% +4.34% (p=0.002 n=9+9)
CholeskyMedium-4 421µs ± 2% 419µs ± 2% ~ (p=0.222 n=9+9)
CholeskyLarge-4 154ms ± 2% 154ms ± 2% ~ (p=0.684 n=10+10)
MulDense100Half-4 276µs ± 7% 301µs ±10% +9.06% (p=0.000 n=10+10)
MulDense100Tenth-4 105µs ± 6% 114µs ±13% +8.24% (p=0.002 n=9+10)
MulDense1000Half-4 130ms ± 0% 130ms ± 1% ~ (p=0.579 n=10+10)
MulDense1000Tenth-4 43.6ms ± 1% 43.8ms ± 1% +0.52% (p=0.022 n=10+9)
MulDense1000Hundredth-4 18.7ms ± 2% 18.8ms ± 3% ~ (p=0.971 n=10+10)
MulDense1000Thousandth-4 13.5ms ± 4% 13.2ms ± 2% -2.14% (p=0.015 n=10+10)
PreMulDense100Half-4 274µs ± 5% 280µs ±12% ~ (p=0.579 n=10+10)
PreMulDense100Tenth-4 82.1µs ± 9% 84.9µs ±11% ~ (p=0.436 n=10+10)
PreMulDense1000Half-4 128ms ± 1% 128ms ± 1% ~ (p=0.780 n=10+9)
PreMulDense1000Tenth-4 42.7ms ± 2% 42.8ms ± 1% ~ (p=0.720 n=9+10)
PreMulDense1000Hundredth-4 17.6ms ± 3% 17.5ms ± 1% ~ (p=0.796 n=10+10)
PreMulDense1000Thousandth-4 12.0ms ± 2% 11.9ms ± 1% ~ (p=0.730 n=9+9)
Row10-4 64.9ns ± 1% 64.7ns ± 0% ~ (p=0.086 n=10+10)
Row100-4 84.5ns ± 1% 83.3ns ± 0% -1.44% (p=0.000 n=10+10)
Row1000-4 252ns ± 0% 256ns ± 0% +1.80% (p=0.000 n=9+10)
Exp10-4 34.3µs ± 0% 34.3µs ± 0% ~ (p=0.108 n=10+9)
Exp100-4 6.24ms ± 8% 6.40ms ± 2% ~ (p=0.211 n=10+9)
Exp1000-4 2.49s ± 4% 2.46s ± 3% ~ (p=0.190 n=10+10)
Pow10_3-4 4.62µs ± 0% 4.60µs ± 0% -0.52% (p=0.000 n=9+10)
Pow100_3-4 837µs ±11% 844µs ± 7% ~ (p=0.796 n=10+10)
Pow1000_3-4 449ms ±12% 413ms ± 2% -8.01% (p=0.006 n=9+8)
Pow10_4-4 6.53µs ± 0% 6.50µs ± 0% -0.42% (p=0.000 n=9+9)
Pow100_4-4 1.49ms ± 4% 1.26ms ± 5% -15.08% (p=0.000 n=10+10)
Pow1000_4-4 636ms ± 4% 620ms ± 3% ~ (p=0.050 n=9+9)
Pow10_5-4 6.56µs ± 1% 6.50µs ± 0% -0.84% (p=0.000 n=9+10)
Pow100_5-4 1.46ms ± 3% 1.28ms ±12% -12.85% (p=0.000 n=9+9)
Pow1000_5-4 639ms ± 3% 623ms ± 3% -2.49% (p=0.028 n=10+9)
Pow10_6-4 8.60µs ± 1% 8.52µs ± 0% -0.99% (p=0.000 n=9+10)
Pow100_6-4 1.95ms ± 2% 1.69ms ±11% -13.36% (p=0.000 n=9+10)
Pow1000_6-4 848ms ± 2% 836ms ± 2% -1.46% (p=0.027 n=8+9)
Pow10_7-4 8.57µs ± 3% 8.42µs ± 0% -1.75% (p=0.000 n=10+9)
Pow100_7-4 1.98ms ± 2% 1.64ms ±12% -17.27% (p=0.000 n=10+10)
Pow1000_7-4 850ms ± 3% 837ms ± 4% ~ (p=0.122 n=8+10)
Pow10_8-4 10.5µs ± 2% 10.3µs ± 0% -1.72% (p=0.001 n=10+9)
Pow100_8-4 2.49ms ± 4% 2.17ms ± 7% -12.58% (p=0.000 n=10+10)
Pow1000_8-4 1.10s ± 8% 1.04s ± 1% -5.69% (p=0.003 n=9+9)
Pow10_9-4 8.46µs ± 1% 8.48µs ± 1% ~ (p=0.213 n=9+9)
Pow100_9-4 1.99ms ± 3% 1.66ms ± 9% -16.56% (p=0.000 n=10+10)
Pow1000_9-4 1.07s ±19% 0.83s ± 1% -21.84% (p=0.000 n=10+9)
MulTransDense100Half-4 490µs ± 3% 419µs ± 7% -14.40% (p=0.000 n=10+10)
MulTransDense100Tenth-4 490µs ± 2% 423µs ± 4% -13.76% (p=0.000 n=10+10)
MulTransDense1000Half-4 242ms ±10% 196ms ± 2% -18.89% (p=0.000 n=10+9)
MulTransDense1000Tenth-4 214ms ±12% 197ms ± 1% -8.10% (p=0.000 n=10+10)
MulTransDense1000Hundredth-4 214ms ±11% 198ms ± 0% -7.38% (p=0.000 n=9+9)
MulTransDense1000Thousandth-4 235ms ±21% 199ms ± 0% -15.21% (p=0.000 n=10+7)
MulTransDenseSym100Half-4 459µs ± 3% 413µs ± 4% -9.85% (p=0.000 n=10+9)
MulTransDenseSym100Tenth-4 460µs ± 3% 427µs ± 4% -7.27% (p=0.000 n=10+10)
MulTransDenseSym1000Half-4 222ms ±19% 200ms ± 2% -9.97% (p=0.000 n=10+9)
MulTransDenseSym1000Tenth-4 231ms ±12% 199ms ± 0% -13.70% (p=0.000 n=10+8)
MulTransDenseSym1000Hundredth-4 227ms ± 7% 199ms ± 0% -12.23% (p=0.000 n=9+8)
MulTransDenseSym1000Thousandth-4 255ms ± 6% 200ms ± 1% -21.64% (p=0.000 n=9+9)
InnerSmSm-4 202ns ± 2% 207ns ± 0% +2.27% (p=0.000 n=10+9)
InnerMedMed-4 5.98µs ± 2% 5.84µs ± 0% -2.24% (p=0.000 n=10+8)
InnerLgLg-4 714µs ±16% 648µs ± 1% -9.20% (p=0.000 n=10+9)
InnerLgSm-4 15.4µs ± 3% 15.3µs ± 0% -1.03% (p=0.003 n=9+9)
MarshalDense10-4 135ns ± 4% 135ns ± 4% ~ (p=0.864 n=10+10)
MarshalDense100-4 877ns ± 4% 868ns ± 3% ~ (p=0.250 n=9+10)
MarshalDense1000-4 7.90µs ± 4% 7.85µs ± 3% ~ (p=0.684 n=10+10)
MarshalDense10000-4 72.8µs ± 4% 73.0µs ± 1% ~ (p=0.869 n=10+10)
UnmarshalDense10-4 129ns ± 5% 127ns ± 4% ~ (p=0.091 n=10+10)
UnmarshalDense100-4 835ns ± 6% 835ns ± 3% ~ (p=1.000 n=10+10)
UnmarshalDense1000-4 7.20µs ± 2% 7.31µs ± 3% +1.45% (p=0.016 n=8+10)
UnmarshalDense10000-4 67.8µs ± 5% 67.7µs ± 2% ~ (p=0.912 n=10+10)
MarshalToDense10-4 149ns ± 2% 149ns ± 1% ~ (p=0.308 n=10+9)
MarshalToDense100-4 1.03µs ± 2% 1.03µs ± 2% ~ (p=0.706 n=9+10)
MarshalToDense1000-4 9.73µs ± 1% 9.66µs ± 0% -0.65% (p=0.000 n=10+10)
MarshalToDense10000-4 96.7µs ± 1% 96.1µs ± 0% -0.56% (p=0.002 n=9+9)
UnmarshalFromDense10-4 353ns ± 1% 369ns ± 2% +4.39% (p=0.000 n=10+10)
UnmarshalFromDense100-4 2.58µs ± 4% 2.58µs ± 2% ~ (p=0.927 n=10+10)
UnmarshalFromDense1000-4 24.0µs ± 2% 24.1µs ± 3% ~ (p=0.353 n=10+10)
UnmarshalFromDense10000-4 237µs ± 4% 237µs ± 3% ~ (p=0.631 n=10+10)
MarshalVector10-4 128ns ± 3% 129ns ± 2% ~ (p=0.261 n=10+10)
MarshalVector100-4 861ns ± 2% 856ns ± 2% ~ (p=0.540 n=10+10)
MarshalVector1000-4 8.54µs ± 3% 8.53µs ± 2% ~ (p=0.968 n=10+9)
MarshalVector10000-4 83.1µs ± 3% 81.1µs ± 3% -2.44% (p=0.009 n=10+10)
UnmarshalVector10-4 121ns ± 6% 121ns ± 2% ~ (p=0.928 n=10+10)
UnmarshalVector100-4 827ns ± 3% 816ns ± 3% ~ (p=0.183 n=10+10)
UnmarshalVector1000-4 7.38µs ± 2% 7.30µs ± 1% ~ (p=0.122 n=10+8)
UnmarshalVector10000-4 67.4µs ± 2% 66.8µs ± 3% ~ (p=0.139 n=8+10)
MarshalToVector10-4 135ns ± 2% 147ns ± 1% +9.32% (p=0.000 n=10+8)
MarshalToVector100-4 946ns ± 1% 954ns ± 2% ~ (p=0.055 n=9+10)
MarshalToVector1000-4 8.99µs ± 0% 9.22µs ± 1% +2.55% (p=0.000 n=9+9)
MarshalToVector10000-4 89.3µs ± 0% 91.4µs ± 0% +2.41% (p=0.000 n=8+10)
UnmarshalFromVector10-4 343ns ± 2% 342ns ± 2% ~ (p=1.000 n=9+10)
UnmarshalFromVector100-4 2.57µs ± 2% 2.56µs ± 3% ~ (p=0.739 n=10+10)
UnmarshalFromVector1000-4 24.1µs ± 2% 24.2µs ± 2% ~ (p=0.247 n=10+10)
UnmarshalFromVector10000-4 239µs ± 2% 236µs ± 1% -1.22% (p=0.006 n=10+9)
Pool10by10Uncleared-4 60.3ns ± 0% 59.1ns ± 0% -2.05% (p=0.000 n=8+7)
Pool10by10Cleared-4 86.4ns ± 2% 86.0ns ± 1% ~ (p=0.529 n=10+8)
New10by10-4 433ns ± 2% 434ns ± 3% ~ (p=1.000 n=10+10)
Pool100by100Uncleared-4 59.5ns ± 2% 59.4ns ± 1% ~ (p=0.807 n=10+10)
Pool100by100Cleared-4 2.88µs ± 0% 2.87µs ± 0% -0.17% (p=0.015 n=10+10)
New100by100-4 18.9µs ± 1% 19.2µs ± 1% +1.44% (p=0.000 n=10+10)
MulWorkspaceDense100Half-4 379µs ± 8% 434µs ±11% +14.57% (p=0.000 n=10+10)
MulWorkspaceDense100Tenth-4 373µs ± 7% 416µs ± 9% +11.52% (p=0.000 n=10+10)
MulWorkspaceDense1000Half-4 190ms ± 0% 189ms ± 1% -0.32% (p=0.031 n=9+9)
MulWorkspaceDense1000Tenth-4 196ms ± 3% 194ms ± 3% ~ (p=0.393 n=10+10)
MulWorkspaceDense1000Hundredth-4 215ms ± 9% 212ms ± 4% ~ (p=0.853 n=10+10)
MulWorkspaceDense1000Thousandth-4 14.4ms ± 6% 14.1ms ±21% ~ (p=0.190 n=10+10)
AddScaledVec10Inc1-4 42.9ns ± 2% 41.9ns ± 1% -2.29% (p=0.001 n=10+10)
AddScaledVec100Inc1-4 88.3ns ± 2% 87.8ns ± 0% ~ (p=0.315 n=10+10)
AddScaledVec1000Inc1-4 564ns ± 3% 558ns ± 0% ~ (p=0.198 n=10+10)
AddScaledVec10000Inc1-4 7.85µs ± 0% 7.67µs ± 0% -2.32% (p=0.000 n=10+10)
AddScaledVec100000Inc1-4 110µs ± 0% 110µs ± 0% ~ (p=0.278 n=10+9)
AddScaledVec10Inc2-4 55.1ns ± 0% 55.1ns ± 0% ~ (p=0.082 n=9+9)
AddScaledVec100Inc2-4 231ns ± 1% 231ns ± 0% ~ (p=0.959 n=10+9)
AddScaledVec1000Inc2-4 2.44µs ± 0% 2.43µs ± 3% ~ (p=0.162 n=8+10)
AddScaledVec10000Inc2-4 24.2µs ± 0% 24.2µs ± 0% -0.12% (p=0.029 n=9+8)
AddScaledVec100000Inc2-4 246µs ± 0% 246µs ± 2% ~ (p=0.481 n=10+10)
AddScaledVec10Inc20-4 55.1ns ± 0% 55.5ns ± 2% +0.76% (p=0.008 n=8+10)
AddScaledVec100Inc20-4 231ns ± 0% 236ns ± 0% +2.38% (p=0.000 n=7+10)
AddScaledVec1000Inc20-4 3.18µs ± 0% 3.34µs ± 1% +5.02% (p=0.000 n=8+10)
AddScaledVec10000Inc20-4 49.0µs ± 1% 49.9µs ± 1% +1.84% (p=0.000 n=10+10)
AddScaledVec100000Inc20-4 1.84ms ± 2% 1.78ms ± 2% -2.92% (p=0.000 n=9+9)
ScaleVec10Inc1-4 18.8ns ± 1% 19.0ns ± 2% ~ (p=0.212 n=10+10)
ScaleVec100Inc1-4 43.1ns ± 3% 41.9ns ± 0% -2.68% (p=0.008 n=10+7)
ScaleVec1000Inc1-4 351ns ± 8% 363ns ± 0% ~ (p=0.179 n=10+10)
ScaleVec10000Inc1-4 4.53µs ± 2% 4.52µs ± 1% ~ (p=0.951 n=10+9)
ScaleVec100000Inc1-4 75.8µs ± 3% 75.3µs ± 0% ~ (p=0.400 n=10+9)
ScaleVec10Inc2-4 24.4ns ± 3% 24.0ns ± 0% -1.62% (p=0.010 n=10+9)
ScaleVec100Inc2-4 132ns ± 1% 128ns ± 0% -2.87% (p=0.000 n=9+9)
ScaleVec1000Inc2-4 1.23µs ± 2% 1.22µs ± 0% ~ (p=0.458 n=10+9)
ScaleVec10000Inc2-4 12.5µs ± 1% 12.2µs ± 0% -2.06% (p=0.000 n=10+10)
ScaleVec100000Inc2-4 142µs ± 0% 143µs ± 0% +0.47% (p=0.000 n=8+9)
ScaleVec10Inc20-4 23.9ns ± 1% 24.5ns ± 1% +2.21% (p=0.000 n=8+9)
ScaleVec100Inc20-4 130ns ± 3% 132ns ± 3% ~ (p=0.072 n=10+10)
ScaleVec1000Inc20-4 1.74µs ± 2% 1.73µs ± 1% ~ (p=0.483 n=10+9)
ScaleVec10000Inc20-4 26.9µs ± 2% 26.9µs ± 5% ~ (p=0.447 n=9+10)
ScaleVec100000Inc20-4 852µs ± 3% 823µs ± 1% -3.46% (p=0.000 n=10+10)
AddVec10Inc1-4 35.3ns ± 0% 36.3ns ± 1% +2.80% (p=0.000 n=7+10)
AddVec100Inc1-4 83.1ns ± 0% 85.3ns ± 0% +2.60% (p=0.000 n=8+8)
AddVec1000Inc1-4 554ns ± 1% 569ns ± 1% +2.72% (p=0.000 n=9+10)
AddVec10000Inc1-4 8.19µs ± 0% 7.83µs ± 4% -4.38% (p=0.000 n=9+9)
AddVec100000Inc1-4 111µs ± 1% 110µs ± 0% ~ (p=0.247 n=10+10)
AddVec10Inc2-4 54.0ns ± 2% 53.1ns ± 0% -1.76% (p=0.001 n=10+10)
AddVec100Inc2-4 229ns ± 1% 228ns ± 0% ~ (p=0.211 n=10+10)
AddVec1000Inc2-4 2.44µs ± 1% 2.42µs ± 2% ~ (p=0.074 n=9+10)
AddVec10000Inc2-4 24.8µs ± 3% 24.0µs ± 2% -3.02% (p=0.000 n=10+10)
AddVec100000Inc2-4 251µs ± 3% 244µs ± 2% -2.71% (p=0.023 n=10+9)
AddVec10Inc20-4 53.1ns ± 0% 53.1ns ± 0% ~ (p=0.459 n=9+9)
AddVec100Inc20-4 230ns ± 1% 230ns ± 2% ~ (p=0.559 n=10+10)
AddVec1000Inc20-4 3.11µs ± 0% 3.26µs ± 0% +5.04% (p=0.000 n=9+10)
AddVec10000Inc20-4 49.3µs ± 2% 48.7µs ± 0% -1.30% (p=0.017 n=10+9)
AddVec100000Inc20-4 1.85ms ± 2% 1.81ms ± 1% -2.50% (p=0.000 n=10+9)
SubVec10Inc1-4 35.6ns ± 1% 35.3ns ± 0% -0.58% (p=0.025 n=8+9)
SubVec100Inc1-4 94.6ns ± 1% 95.2ns ± 2% +0.59% (p=0.026 n=9+9)
SubVec1000Inc1-4 553ns ± 0% 557ns ± 1% +0.64% (p=0.001 n=8+9)
SubVec10000Inc1-4 8.34µs ± 4% 7.66µs ± 0% -8.10% (p=0.000 n=10+10)
SubVec100000Inc1-4 115µs ± 4% 113µs ± 0% ~ (p=0.156 n=10+9)
SubVec10Inc2-4 53.8ns ± 1% 54.7ns ± 5% ~ (p=0.127 n=8+10)
SubVec100Inc2-4 253ns ± 1% 252ns ± 0% -0.55% (p=0.016 n=10+8)
SubVec1000Inc2-4 2.45µs ± 2% 2.43µs ± 0% -0.72% (p=0.000 n=9+8)
SubVec10000Inc2-4 24.7µs ± 2% 24.2µs ± 0% -1.94% (p=0.000 n=10+9)
SubVec100000Inc2-4 256µs ± 5% 246µs ± 0% -3.77% (p=0.000 n=10+10)
SubVec10Inc20-4 53.7ns ± 3% 53.0ns ± 0% -1.28% (p=0.002 n=10+9)
SubVec100Inc20-4 256ns ± 2% 252ns ± 0% -1.45% (p=0.013 n=10+10)
SubVec1000Inc20-4 3.10µs ± 0% 3.26µs ± 0% +4.93% (p=0.000 n=9+9)
SubVec10000Inc20-4 48.6µs ± 0% 48.6µs ± 0% +0.11% (p=0.002 n=9+9)
SubVec100000Inc20-4 1.82ms ± 0% 1.83ms ± 2% ~ (p=0.497 n=9+10)
LGTM.
SSA should eventually be faster, but I've seen a lot of variance. See for example https://github.com/golang/go/issues/14995
@btracey @vladimir-ch Please take a look.
Benchmarks for justification in preparation.
Approach prompted by comment by gri in the tables CL currently in review.