AccelerateHS / accelerate

Embedded language for high-performance array computations
https://www.acceleratehs.org
Other
901 stars 118 forks source link

Arbitrarily sized stencils #426

Open JoshMeredith opened 6 years ago

JoshMeredith commented 6 years ago

It should be possible to implement stencils of arbitrary size, where the size may not be known until runtime. This would look something like:

stencil'
  :: (Shape sh, Elt a, Elt b)
  => Exp sh -> Boundary (Array sh a)
  -> (Acc (Array sh a) -> Acc (Scalar b))
  -> Acc (Array sh a)
  -> Acc (Array sh b)

This would be used like the existing stencils, but instead of a (nested) tuple, the summary function passed an array to summarise.

As a motivating example, a Gaussian blur:

gaussianBlur :: Exp DIM2 -> Acc (Array DIM2 Float) -> Acc (Array DIM2 Float)
gaussianBlur blurRegionSize = stencil' blurRegionSize (function $ const 0) goBlur
  where
    goBlur = sum . flatten . zipWith (*) gaussCoefficients
    gaussCoefficients :: Acc (Array DIM2 Float) -- shape gaussCoefficients == blurRegionSize
ghost commented 6 years ago

I ran into a need for this as well. Formulated here: https://github.com/AccelerateHS/accelerate/issues/426. I'm not sure which is a harder engineering challenge to integrate into Accelerate, but the type I expressed I think could support any general convolution, with a function for how to interact element wise with a kernel (Exp a -> Exp b -> Exp c) and then an associative function that reduces down these values to one element (Exp c -> Exp c -> Exp d). I don't really like the notation that stencil uses, because while its convenient to write an explicit function, using it like a kernel array is cumbersome. Would be nice to write

gaussianKernel :: Acc (Matrix Float)
gaussianKernel = (use $ A.fromList ... / A.map ...)

gaussian :: Acc (Matrix Float) -> Acc (Matrix Float)
gaussian arr = stridedConvolution (Z :. 0 :. 0) (Z :. 0 :. 0) (*) (+) gaussianKernel arr
ghost commented 6 years ago

The canny example in the accelerate-examples has some haskell land conversion that makes this sort of thing decent, though still not very pretty

convolve5x1 :: Num a => [Exp a] -> Stencil5x1 a -> Exp a
convolve5x1 kernel (_, (a,b,c,d,e), _)
  = P.sum $ P.zipWith (*) kernel [a,b,c,d,e]

convolve1x5 :: Num a => [Exp a] -> Stencil1x5 a -> Exp a
convolve1x5 kernel ((_,a,_), (_,b,_), (_,c,_), (_,d,_), (_,e,_))
= P.sum $ P.zipWith (*) kernel [a,b,c,d,e]
...
gaussianX :: Acc (Image Float) -> Acc (Image Float)
gaussianX = stencil (convolve5x1 gaussian) A.clamp
  where
    gaussian = P.map (/16) [ 1, 4, 6, 4, 1 ]

But I had some long compilation time issues previously following this method (which Trevor very thankfully fixed). But I'm not sure what the scaling issues would be like if the kernel would be made much bigger. Whereas at the very worst, formulating this as convolution, you could write a binding to the cudnn libraries which support these types of things in the spirit of the many deep learning use cases.

My particular use cases has been some simple object detection (looking for high density chunks of a certain feature by way of a strided convolution). Also, similar to gaussian kernels, being able to create certain angles of gabor filters as Accelerate matrices for convolution would be nice as well.

tmcdonell commented 6 years ago

I think it is important to tease out a few more details here.

As far as I can tell, the discussion here (and in #387) are about convolutions specifically, which are generally more restricted than what stencil currently offers.

In particular this ticket is asking for arbitrarily sized stencils. As a concrete example let's say a 51x51 gaussian blur, which we certainly can't do with the current interface (ignore whether or not this is a sensible thing to do). A basic building block which would allow this might be something like:

stencil'
    :: Exp sh                       -- size of the border, I promise not to read outside this offset
    -> ((Exp sh -> Exp a) -> Exp b) -- given a function to read neighbouring elements (by relative offset), produce a final value
    -> Boundary (Array sh a)
    -> Acc (Array sh a)
    -> Acc (Array sh b)

...and then implement more user-friendly functions on top of that (e.g. which iterate over the values in the tile and perform a convolution).

...because while its convenient to write an explicit function, using it like a kernel array is cumbersome

That is fair. The gaussian blur example works fine here because the coefficients are known statically. If they aren't then it will be more tedious to work with.

For some of your use cases, I'm not sure when it is better to do the im2col trick and implement the stencil as a matrix operation instead.