gonum / blas

A BLAS implementation for Go [DEPRECATED]
172 stars 16 forks source link

Optimize Dtrmv and add benchmarks #198

Closed btracey closed 7 years ago

btracey commented 7 years ago

Here are the benchmarks

BenchmarkDtrmvMed/Inc1_UP_NT_NU-8     11655         9464          -18.80%
BenchmarkDtrmvMed/Inc1_UP_NT_UN-8     5853          2553          -56.38%
BenchmarkDtrmvMed/Inc1_UP_TR_NU-8     12311         8253          -32.96%
BenchmarkDtrmvMed/Inc1_UP_TR_UN-8     5171          3192          -38.27%
BenchmarkDtrmvMed/Inc1_LO_NT_NU-8     12645         12721         +0.60%
BenchmarkDtrmvMed/Inc1_LO_NT_UN-8     4964          5073          +2.20%
BenchmarkDtrmvMed/Inc1_LO_TR_NU-8     11328         6623          -41.53%
BenchmarkDtrmvMed/Inc1_LO_TR_UN-8     5097          2957          -41.99%
BenchmarkDtrmvMed/IncN_UP_NT_NU-8     11372         10931         -3.88%
BenchmarkDtrmvMed/IncN_UP_NT_UN-8     5002          5512          +10.20%
BenchmarkDtrmvMed/IncN_UP_TR_NU-8     13232         8657          -34.58%
BenchmarkDtrmvMed/IncN_UP_TR_UN-8     5697          6063          +6.42%
BenchmarkDtrmvMed/IncN_LO_NT_NU-8     12390         10994         -11.27%
BenchmarkDtrmvMed/IncN_LO_NT_UN-8     4929          5397          +9.49%
BenchmarkDtrmvMed/IncN_LO_TR_NU-8     15613         9571          -38.70%
BenchmarkDtrmvMed/IncN_LO_TR_UN-8     9924          5995          -39.59%
btracey commented 7 years ago

I really don't understand why the one case is so much slower using f64 than native Go. It seems like basically the same code as the case two above.

btracey commented 7 years ago

Is this error because I forgot to run the single precision generation?

btracey commented 7 years ago

I'm getting the error

./level2single.go:461: undefined: f64 in f64.AxpyUnitary
./level2single.go:484: undefined: f64 in f64.AxpyUnitary

Which makes sense given the source code, but I don't understand why my change to single_precision.bash didn't correctly rewrite the source code.

kortschak commented 7 years ago

There are two place (one here) where you have not committed a generated change.

btracey commented 7 years ago

I understand that the code isn't changed, I don't understand why the code didn't get changed. You can see the rewrite rule I added https://github.com/gonum/blas/pull/198/files#diff-35f433d99da386be53694204acb68185R23 . It doesn't actually change the file on my machine.

kortschak commented 7 years ago

There needs to be a rewrite in the level2 routines section.

(Also, while you are here can you delete the asm -> f64 rewrite in the level 1 section that is obviously not doing anything).

btracey commented 7 years ago

Thanks, didn't understand. Fixed.

btracey commented 7 years ago

Okay. I think I fixed the generation problems, sorry.

I also removed the TODO. I cannot replicate that benchmark result today, it's ~30% faster with the f64 call, which is in line with the other benchmarks. I added the benchmark set for the Large matrices to confirm.

I also added the benchmarks to cgo. Here's an interesting result:

brendan:~/Documents/mygo/src/github.com/gonum/blas/native$ benchcmp dtrmv_new.txt dtrmv_cgo.txt 
benchmark                               old ns/op     new ns/op     delta
BenchmarkDtrmvLarge/Inc1_UP_NT_NU-8     354377        1090924       +207.84%
BenchmarkDtrmvLarge/Inc1_UP_NT_UN-8     218301        1072323       +391.21%
BenchmarkDtrmvLarge/Inc1_UP_TR_NU-8     246109        727022        +195.41%
BenchmarkDtrmvLarge/Inc1_UP_TR_UN-8     243481        725940        +198.15%
BenchmarkDtrmvLarge/Inc1_LO_NT_NU-8     206178        1075069       +421.43%
BenchmarkDtrmvLarge/Inc1_LO_NT_UN-8     206640        1067515       +416.61%
BenchmarkDtrmvLarge/Inc1_LO_TR_NU-8     278862        816059        +192.64%
BenchmarkDtrmvLarge/Inc1_LO_TR_UN-8     230160        793998        +244.98%
BenchmarkDtrmvLarge/IncN_UP_NT_NU-8     556124        1206024       +116.86%
BenchmarkDtrmvLarge/IncN_UP_NT_UN-8     516521        1163167       +125.19%
BenchmarkDtrmvLarge/IncN_UP_TR_NU-8     572751        805090        +40.57%
BenchmarkDtrmvLarge/IncN_UP_TR_UN-8     580390        819339        +41.17%
BenchmarkDtrmvLarge/IncN_LO_NT_NU-8     525360        1225286       +133.23%
BenchmarkDtrmvLarge/IncN_LO_NT_UN-8     519012        1235703       +138.09%
BenchmarkDtrmvLarge/IncN_LO_TR_NU-8     574322        821403        +43.02%
BenchmarkDtrmvLarge/IncN_LO_TR_UN-8     562985        825352        +46.60%