dlewissandy / lambda-blas

Native Haskell implementation of the BLAS library
BSD 3-Clause "New" or "Revised" License
9 stars 2 forks source link

As a software engineer, I would like a native Haskell implementation of the sdot function to that I can rapidly compute the vector dot product in a deterministic, thread-safe, type-safe manner. #6

Closed dlewissandy closed 7 years ago

dlewissandy commented 7 years ago

Implement the sdot function most conceptually clear manner possible using Haskell recursive functions.

dlewissandy commented 7 years ago

It is worth mentioning that the BLAS sdot function does more than compute the dot product of two vectors. This can be seen from the type of the sdot function:

 REAL FUNCTION SDOT(N,SX,INCX,SY,INCY)

which has three additional parameters: N, INCX and INCY. The parameter N determines the number of terms in the sum and can range between 0 and the length of the shorter of the two vectors. The parameters INCX and INCY control the direction and magnitude of the steps used to traverse each vector. In the case when INCX, INCY and N are all unity, then sdot corresponds to the vector dot product. For other values, sdot has the following Haskell specification:

sdot::Int -> [Float]->Int->[Float]->Int
sdot n sx incx sy incy = sum [ sx!!(ix incx i)*sy!!(ix incx i)| i<-[0..n-1]]
   where 
       ix inc i
            | inc > 1 = inc*i
            | inc < 1 = (-n+i)*inc
dlewissandy commented 7 years ago

The Fortran implementation for the sdot function handles 3 different special cases in the parameters:

  1. When the at least one INCX and INCY is not equal to unity, it uses a for loop to traverse the two vectors and accumulate the elemental products in a mutable accumulator.
  2. When both INCX and INCY are equal to unity and N is divisible by 5, it uses a loop unrolling strategy to traverse the two vectors and accumulate 5 elemental products into a mutable accumulator
  3. When both INCX and INCY are equal to unity and N is NOT divisible by 5, it first uses a for loop to traverse the two vectors and accumulate element products into a mutable accumulator until the remaining number of terms is divisible by 5, then it uses the loop unrolling strategy described in case 2.

In the Haskell code, I would like to ideally like to express my implementation in terms of good producers ( category theorists and computer scientists call these unfolds or anamorphisms) and good consumers ( category theorists call these folds or catamorphisms ) for some datatype. Doing so will allow me to leverage Haskell's rewrite rule system to not only automatically fuse (deforest/eliminate intermediate data structures) in my library function, but also potentially fuse with other good producers/ good consumers in a calling program.

Examples of good consumers and producers over lists include: --- --- consumers fold, sum, prod, map take, filter, unzip, etc producers unfold, take, map filter, repeat, zip, zipWith, etc

A good article on fusion of recursive functions and be found here and here

dlewissandy commented 7 years ago

Assuming we implement the haskell sdot function as a faithful translation of the FORTRAN algorithm, I will need to implement each of the three cases described above. The first cases is straight forward to implement in Haskell using datatype fusion:

case1  n sx incx sy incy = foldl' (+) 0 . map f [0..n-1] 
   where
       f i =  sx!!(ix incx i * sy!!(ix incx i)
       ix inc i = ...

Note the use of foldl' was used to ensure that the terms are summed in the same order as in the Fortran implementation (we have to be concerned about floating point precision!). One could also change the implementation of ix inc to eliminate the need for this call. When we use Haskell lists, the time complexity of this function will be worst case on the order O(n^2) due to the cost of arbitrary dereferencing of elements of the list. Furthermore this function will not fuse with either of its vector arguments. Investigating use of alternate data-types and improved fusion with upstream producers should be investigated as an optimization.

Case 3 is simple to write in terms of Haskell list functions such that it will have a time complexity of O(m):

case3 n sx sy = 
    let m = n `mod` 5
         subtotal = foldl' (+) 0 . take m . zipWith (*) sx sy
    in  case2 n (drop m sx) (drop m sy) subtotal    

however, depending upon the choice of collection datatype (Array, List, Vector, ...) there may not be rewrite rules to ensure fusion. This should be investigated as an optimization.

In both Case 1 and 2 we have used the foldl function to preserve the order that the summands are added.

dlewissandy commented 7 years ago

The implementation of the second case can be implemented as the composite of recursive functions:

case2 n sx sy subtotal =  foldl' (sum5) subtotal .  unfold g . zipWith (*) sx sy
    where 
        g [] = Nothing
        g zs = Just (splitAt 5 zs)    
        sum5 z [z0,z1,z2,z3,z4] = z + z0 + z1 + z2 + z3 + z4

The time-complexity of this implementation should be O(n-m) where m is the modulus of n over 5. However, this implementation will have serious difficulty with fusion because both unfold and zipWith are producers and will not automatically fuse. Evaluating options for fusion should be investigated as an optimization.

dlewissandy commented 7 years ago

WHY MANUAL FUSION STILL MATTERS

Hylomorphisms result from fusing a unfold and a fold; for example the hylomorphism function for lists has the following specification:

hyloL :: ( a -> Maybe (b,a)) -> ( c -> b -> c ) -> c -> a -> b
hyloL f g b = foldl f b . unfoldr g

which can be unrolled to obtain:

hyloL :: ( a -> Maybe (b,a)) -> ( c -> b -> c ) -> c -> a -> b
hyloL f g c a = case f a of 
    Nothing ->  c
    Just (b,a') -> hyloL f g (f c b) a' 

If Haskell's rewrite system can identify opportunities to fuse good producers (unfolds) with good consumers (folds), we shouldn't need to explicitly use this function. It turns out that this is not the case. In writing the Haskell implementation of the sdot algorithm I have encountered an excellent illustration of why a hylomophism function may occasionally be required. Consider the following snippet of code from case2 above:

foldl' (sum) subtotal .  unfold f

This snippet implements the unrolled loop strategy, recursively extracting 5 elements from the head of the list and adding them to an accumulator. Unfortuantely when I attempted to use vectors rather than lists, this function failed to compile with the following error:

No instance for (V.Storable (V.Vector Float)) arising from a use of ‘V.unfold’

GHC is complaining that the intermediate datatype (Vectors (Vectors Float)) is ill typed because it doesn't know how to create nested Vectors. What gives? If the intermediate data is never constructed, Haskell will never need to construct a nested vector! To explain this problem, you need to look at the types of the functions involved:

foldl' (sum) subtotal .  unfold f   :: Vector Float -> Float  -- This type is ok
foldl' (sum) subtotal :: Vector (Vector Float) -> Float      -- This is ill typed 
unfold f :: Vector Float ->Vector (Vector Float)                -- This is ill typed.

As expected, the fold.unfold composite is well typed and makes no mention of nested vectors. However, the fold and unfold functions individually consume or produce the nested type, and are therefore ill-typed. Since Haskell's rewrite rules only fire during the simplifier phase of compilation, which occurs after type checking, the simplifier has no chance to eliminate the appearance of the intermediate type. This type error can be eliminated by manually fusing the fold/unfold, for example:

case3 n sx sy subtotal = hylol sum5 g subtotal .  zipWith (*) sx sy
dlewissandy commented 7 years ago

WHY MANUAL FUSION STILL MATTERS 2

As we noted earlier, both zipWith and unfold are producers, and will not automatically fuse, however, manual fusion can be employed to express case 3 as a single hylomorphism. we observe that:

uncurry ( unfold g . zipWith (*) ) = unfold h 
   where h (x1:x2:x3:x4:x5:xs,y1:y2:y3:y4:y5:ys) = Just ( [x1*y1,x2*y2,x3*y3,x4*y4,x5*y4],(xs,ys))
              h  _ = Nothing 

Which leads to the following expression:

case3 n sx sy subtotal = hylol sum5 h subtotal (sx,sy) 
   where h ...
dlewissandy commented 7 years ago

Closed upon approval of pull request #23