Closed btracey closed 10 years ago
PTAL @kortschak . The overall idea is the same as before, but the k computation is done in serial. This pushes the length checking into the worker rather than the sender loop. It seems to me easier to examine by looking at the file as a whole rather than the diff.
Those two approaches are functionally equivalent, so I suggest running benchmarks on them.
If you mean the broad architectural change, then the benchmarks are in the commit message. If you mean the cleanup, the difference between the two will be completely lost in the noise On Oct 30, 2014 5:55 AM, "Jonathan J Lawlor" notifications@github.com wrote:
Those two approaches are functionally equivalent, so I suggest running benchmarks on them.
— Reply to this email directly or view it on GitHub https://github.com/gonum/blas/pull/22#issuecomment-61086617.
PTAL
LGTM after fixing minor comments.
The old code was wasteful. It sent 2 messages every block computation, and it launched a new goroutine at each block computation. This was done to synchronize the computations along the k dimension, but this is silly, because the k dimension is being computed in serial anyway. This refactors the code such that the k dimension is computed in serial by a worker before a new {i,j} pair is received. This reduced the messages sent, reduces the number of goroutines, and should help cache coherency. This seems to improve performance somewhat, but even if the performance gains were negligible, the simplification makes it worth it. When the inner loop is sped up (bounds checking, vectorization), this code will almost certainly be faster.
BenchmarkDgemmSmSmSm 2372 2313 -2.49% BenchmarkDgemmSmSmSm-2 2280 2267 -0.57% BenchmarkDgemmSmSmSm-4 2290 2295 +0.22% BenchmarkDgemmMedMedMed 1302412 1293271 -0.70% BenchmarkDgemmMedMedMed-2 723841 717892 -0.82% BenchmarkDgemmMedMedMed-4 563282 575358 +2.14% BenchmarkDgemmMedLgMed 12601547 12633168 +0.25% BenchmarkDgemmMedLgMed-2 6668578 6727320 +0.88% BenchmarkDgemmMedLgMed-4 4005838 4029760 +0.60% BenchmarkDgemmLgLgLg 1284515732 1317360458 +2.56% BenchmarkDgemmLgLgLg-2 663440326 653124639 -1.55% BenchmarkDgemmLgLgLg-4 442994398 443350897 +0.08% BenchmarkDgemmLgSmLg 15146665 15495804 +2.31% BenchmarkDgemmLgSmLg-2 7894837 7818921 -0.96% BenchmarkDgemmLgSmLg-4 4770521 5022796 +5.29% BenchmarkDgemmLgLgSm 15419348 14768983 -4.22% BenchmarkDgemmLgLgSm-2 7854456 7819836 -0.44% BenchmarkDgemmLgLgSm-4 4833188 4891688 +1.21% BenchmarkDgemmHgHgSm 1542986120 1498077839 -2.91% BenchmarkDgemmHgHgSm-2 770878951 753059418 -2.31% BenchmarkDgemmHgHgSm-4 524997747 503342998 -4.12% BenchmarkDgemmMedMedMedTNT 1293529 1296046 +0.19% BenchmarkDgemmMedMedMedTNT-2 744613 730826 -1.85% BenchmarkDgemmMedMedMedTNT-4 568071 569269 +0.21% BenchmarkDgemmMedMedMedNTT 1047344 1012308 -3.35% BenchmarkDgemmMedMedMedNTT-2 571599 561348 -1.79% BenchmarkDgemmMedMedMedNTT-4 451243 458217 +1.55% BenchmarkDgemmMedMedMedNTNT 1320557 1294146 -2.00% BenchmarkDgemmMedMedMedNTNT-2 721417 710031 -1.58% BenchmarkDgemmMedMedMedNTNT-4 558277 557583 -0.12%