gonum / blas

A BLAS implementation for Go [DEPRECATED]
172 stars 16 forks source link

goblas: optimize Daxpy using SSE2 on amd64 #80

Closed fhs closed 9 years ago

fhs commented 9 years ago

Go 1.4 on Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz:

benchmark                           old ns/op     new ns/op     delta
BenchmarkDaxpySmallBothUnitary      22.4          15.4          -31.25%
BenchmarkDaxpySmallIncUni           24.4          24.2          -0.82%
BenchmarkDaxpySmallUniInc           24.6          24.1          -2.03%
BenchmarkDaxpySmallBothInc          24.4          23.5          -3.69%
BenchmarkDaxpyMediumBothUnitary     1351          360           -73.35%
BenchmarkDaxpyMediumIncUni          1695          1226          -27.67%
BenchmarkDaxpyMediumUniInc          1586          1185          -25.28%
BenchmarkDaxpyMediumBothInc         1733          1357          -21.70%
BenchmarkDaxpyLargeBothUnitary      136000        72131         -46.96%
BenchmarkDaxpyLargeIncUni           240527        198011        -17.68%
BenchmarkDaxpyLargeUniInc           206700        162471        -21.40%
BenchmarkDaxpyLargeBothInc          279814        239500        -14.41%
BenchmarkDaxpyHugeBothUnitary       16175068      12855609      -20.52%
BenchmarkDaxpyHugeIncUni            37333482      34263411      -8.22%
BenchmarkDaxpyHugeUniInc            29515256      27305122      -7.49%
BenchmarkDaxpyHugeBothInc           47188205      47970088      +1.66%
kortschak commented 9 years ago

LGTM