DataHaskell / dh-core

Functional data science
138 stars 23 forks source link

dense-linear-algebra: Getting stream fusion to work across `Matrix`'s #60

Open Magalame opened 5 years ago

Magalame commented 5 years ago

The current problem is as follows:

(U.sum . flip M.column 0) a does not fuse. It seems to boil down to:

testRewrite1 :: Matrix -> Double  --fuses
testRewrite1 (Matrix r c v) = U.sum . flip (\u j -> U.generate r (\i -> u `U.unsafeIndex` (j + i * c))) 0 $ v

testRewrite2 :: Matrix -> Double -- does NOT fuse
testRewrite2 m = U.sum . flip (\(Matrix r c v) j -> U.generate r (\i -> v `U.unsafeIndex` (j + i * c))) 0 $ m

note: the flip isn't important, it's just by convenience, since this is from https://github.com/Magalame/fastest-matrices

So the thing that seems to happen is that stream fusion cannot "go through" Matrix's, I'm not sure exactly why

Magalame commented 5 years ago

Interestingly, this fuses:

testRewrite3 :: Matrix -> Double 
testRewrite3 (Matrix r c v) = U.sum . flip (\(r,c,v) j -> U.generate r (\i -> v `U.unsafeIndex` (j + i * c))) 0 $ (r,c,v)
Magalame commented 5 years ago

It seems that pattern matching is the cause? This also fuses:

testRewrite4 :: Matrix -> Double 
testRewrite4 m = U.sum . flip (\m1 j -> U.generate (rows m1) (\i -> (_vector m1) `U.unsafeIndex` (j + i * (cols m1)))) 0 $ m
Magalame commented 5 years ago

Bottom line:

column :: Matrix -> Int -> Vector Double
column (Matrix r c v) j= U.generate r (\i -> v `U.unsafeIndex` (j + i * c))
{-# INLINE column #-}

column2 :: Matrix -> Int -> Vector Double
column2 m j = U.generate (rows m) (\i -> (_vector m) `U.unsafeIndex` (j + i * (cols m)))
{-# INLINE column2 #-}

testRewrite0 :: Matrix -> Double
testRewrite0 a = (U.sum . flip column2 0) a

testRewrite1 :: Matrix -> Double
testRewrite1 a = (U.sum . flip column 0) a

testRewrite0 fuses, testRewrite1 doesn't.

However, the fusion is depending on the INLINE and I can't get M.column2 to get inlined across modules (moving the definition of column2 from the main file to Matrix.hs)

Magalame commented 5 years ago

The results are actually a bit disappointing, I'll keep digging

Magalame commented 4 years ago

Back to it! Turns out GHC makes wonders with rewrite rules/inlining. So throwing in for example:

{-# INLINE [1] row #-}
{-# RULES
      "row/fuse"    forall c v r i.  row (M.Matrix r c v) i = U.slice (c*i) c v
  #-}

works amazing.

It gives us some nice stuff:

Say I have matrices A,B, and C = A*B. Then now I can take the sum of a row of C, without allocating the whole matrix, and a ~10x speedup on my machine.

I would expect to hit a point of diminishing return though, so that's to be explored