Fix fast matrix mutiplication implementation

lehins commented 5 years ago

I finally figured out why I was getting much slower benchmarks in massiv comparing to the implementation in dense-linear-algebra. I was benchmarking multiplication of A(200x600) x B(600x200) matrices.

To reproduce the bug I used the benchmarks from fastest-matrices:

λ> uDLA <- vectorGen
λ> uList = U.toList uDLA
λ> vMA = MA.fromList MA.Seq uList :: MA.Array MA.P MA.Ix1 Double
λ> aMA <- MA.resizeM (MA.Sz2 4 9) vMA :: IO (MA.Array MA.P MA.Ix2 Double)
λ> bMA <- MA.resizeM (MA.Sz2 9 4) vMA :: IO (MA.Array MA.P MA.Ix2 Double)
λ> aMA MA.|*| bMA
Array P Seq (Sz (4 :. 4))
  [ [ 1.7700914751064436, 1.2309820977803536, 1.707826152795249, 1.6772335798257119 ]
  , [ 2.555702366316134, 1.3237126494921863, 2.849472501297746, 2.560149705981688 ]
  , [ 1.6580016676221996, 1.8209635429144693, 2.6035157013237877, 2.2959408241699633 ]
  , [ 2.094817978804908, 1.56376512947369, 2.4469042480887095, 2.3480437251846893 ]
  ]
λ> aNH = NP.fromList uList :: NH.Array V.Vector '[4, 9] Double
λ> bNH = NP.fromList uList :: NH.Array V.Vector '[9, 4] Double
λ> NH.mmult aNH bNH
[[1.7700914751064438, 1.2309820977803536, 1.7078261527952487, 1.6772335798257119],
 [2.5557023663161345, 1.3237126494921863, 2.849472501297746, 2.5601497059816873],
 [1.6580016676221996, 1.8209635429144693, 2.603515701323788, 2.2959408241699633],
 [2.0948179788049086, 1.56376512947369, 2.446904248088709, 2.3480437251846893]]
λ> aDLA = M.Matrix 4 9 uDLA
λ> bDLA = M.Matrix 9 4 uDLA
λ> MF.multiply aDLA bDLA
(4,4) 0.789  0.4997 0.6749 0.8261
      1.1999 0.6489 1.3264 1.3812
      0.8818 0.9981 1.4134 1.4628
      0.5801 0.6332 0.5415 0.7313

Magalame commented 5 years ago

There seems to be some trouble with the lastest version of stack, see here, hence the failing travis build. Do you mind doing the same (commenting out the cat ... lines in the travis config file) so that we can actually know if anything went wrong?

lehins commented 5 years ago

@Magalame I've cherry picked your commit. Travis is passing.

Magalame commented 5 years ago

Great!

ocramz commented 5 years ago

Um, doesn't this change the behaviour of accum? Now that we don't have test logs printed out by Travis it's hard to catch this. @Magalame did you test this code before merging?

lehins commented 5 years ago

@ocramz Just in case, I did test the change ;) Output matches other libraries

λ> uDLA <- vectorGen
λ> aDLA = M.Matrix 4 9 uDLA
λ> bDLA = M.Matrix 9 4 uDLA
λ> MF.multiply aDLA bDLA
(4,4) 1.7701 1.231  1.7078 1.6772
      2.5557 1.3237 2.8495 2.5601
      1.658  1.821  2.6035 2.2959
      2.0948 1.5638 2.4469 2.348

Magalame commented 5 years ago

@ocramz I did :) And the change itself makes sense. The purpose of accum is to compute the dot product of the ith row of the first matrix with the jth column of the second. And the ith row has length c1 (and the jth column too, since both are supposed to have the same length)

DataHaskell / dh-core

Fix fast matrix mutiplication implementation #63