gonum / blas

A BLAS implementation for Go [DEPRECATED]
172 stars 16 forks source link

goblas: fix Ddot for unaligned slices #78

Closed fhs closed 9 years ago

fhs commented 9 years ago

No real change in the benchmark results (Go 1.4 on Intel(R) Xeon(R) CPU E5-2660 0 @ 2.20GHz):

benchmark                          old ns/op     new ns/op     delta
BenchmarkDdotSmallBothUnitary      15.3          15.3          +0.00%
BenchmarkDdotSmallIncUni           21.5          21.5          +0.00%
BenchmarkDdotSmallUniInc           22.0          22.0          +0.00%
BenchmarkDdotSmallBothInc          21.5          21.5          +0.00%
BenchmarkDdotMediumBothUnitary     505           505           +0.00%
BenchmarkDdotMediumIncUni          1186          1188          +0.17%
BenchmarkDdotMediumUniInc          1178          1178          +0.00%
BenchmarkDdotMediumBothInc         1220          1213          -0.57%
BenchmarkDdotLargeBothUnitary      59322         59327         +0.01%
BenchmarkDdotLargeIncUni           163780        163693        -0.05%
BenchmarkDdotLargeUniInc           141651        141706        +0.04%
BenchmarkDdotLargeBothInc          203541        203794        +0.12%
BenchmarkDdotHugeBothUnitary       9586039       9586340       +0.00%
BenchmarkDdotHugeIncUni            31228460      31128999      -0.32%
BenchmarkDdotHugeUniInc            21437424      21476630      +0.18%
BenchmarkDdotHugeBothInc           40115032      40122703      +0.02%

Fixes #77

btracey commented 9 years ago

LGTM

Would you mean explaining a bit for my education? I tried googling but the discussions were pretty technical. I can see the affect of the change, and understand the differences. However, it seems like there are two instructions that are equally fast, where one only works on aligned data, and the other works on any data. Why would one want to use MOVAPD when you could use MOVUPD instead? Is there a theoretic speed penalty? Is it just that you didn't understand? I'm not trying to be accusatory, just trying to keep up on top of code development. Thanks for the changes.

btracey commented 9 years ago

Just a note, early merge because it fixes ddot.We should still consider the incremented test if appropropriate.

fhs commented 9 years ago

MODAPD is suppose to be faster than MOVUPD, but in practice, at least on the CPU I ran the benchmarks on, it doesn't make much difference. It general alignment is a good thing. See: https://groups.google.com/d/msg/golang-nuts/HFjwPFYrCqg/oSttxY8ajr4J

btracey commented 9 years ago

I'm going to leave a related note on that thread for future readers. Feel free to chime in if you have things to add.

Thanks for the fix.