Closed btracey closed 7 years ago
I really don't understand why the one case is so much slower using f64
than native Go. It seems like basically the same code as the case two above.
Is this error because I forgot to run the single precision generation?
I'm getting the error
./level2single.go:461: undefined: f64 in f64.AxpyUnitary
./level2single.go:484: undefined: f64 in f64.AxpyUnitary
Which makes sense given the source code, but I don't understand why my change to single_precision.bash didn't correctly rewrite the source code.
There are two place (one here) where you have not committed a generated change.
I understand that the code isn't changed, I don't understand why the code didn't get changed. You can see the rewrite rule I added https://github.com/gonum/blas/pull/198/files#diff-35f433d99da386be53694204acb68185R23 . It doesn't actually change the file on my machine.
There needs to be a rewrite in the level2 routines section.
(Also, while you are here can you delete the asm -> f64
rewrite in the level 1 section that is obviously not doing anything).
Thanks, didn't understand. Fixed.
Okay. I think I fixed the generation problems, sorry.
I also removed the TODO. I cannot replicate that benchmark result today, it's ~30% faster with the f64
call, which is in line with the other benchmarks. I added the benchmark set for the Large matrices to confirm.
I also added the benchmarks to cgo. Here's an interesting result:
brendan:~/Documents/mygo/src/github.com/gonum/blas/native$ benchcmp dtrmv_new.txt dtrmv_cgo.txt
benchmark old ns/op new ns/op delta
BenchmarkDtrmvLarge/Inc1_UP_NT_NU-8 354377 1090924 +207.84%
BenchmarkDtrmvLarge/Inc1_UP_NT_UN-8 218301 1072323 +391.21%
BenchmarkDtrmvLarge/Inc1_UP_TR_NU-8 246109 727022 +195.41%
BenchmarkDtrmvLarge/Inc1_UP_TR_UN-8 243481 725940 +198.15%
BenchmarkDtrmvLarge/Inc1_LO_NT_NU-8 206178 1075069 +421.43%
BenchmarkDtrmvLarge/Inc1_LO_NT_UN-8 206640 1067515 +416.61%
BenchmarkDtrmvLarge/Inc1_LO_TR_NU-8 278862 816059 +192.64%
BenchmarkDtrmvLarge/Inc1_LO_TR_UN-8 230160 793998 +244.98%
BenchmarkDtrmvLarge/IncN_UP_NT_NU-8 556124 1206024 +116.86%
BenchmarkDtrmvLarge/IncN_UP_NT_UN-8 516521 1163167 +125.19%
BenchmarkDtrmvLarge/IncN_UP_TR_NU-8 572751 805090 +40.57%
BenchmarkDtrmvLarge/IncN_UP_TR_UN-8 580390 819339 +41.17%
BenchmarkDtrmvLarge/IncN_LO_NT_NU-8 525360 1225286 +133.23%
BenchmarkDtrmvLarge/IncN_LO_NT_UN-8 519012 1235703 +138.09%
BenchmarkDtrmvLarge/IncN_LO_TR_NU-8 574322 821403 +43.02%
BenchmarkDtrmvLarge/IncN_LO_TR_UN-8 562985 825352 +46.60%
Here are the benchmarks