AccelerateHS / accelerate

Embedded language for high-performance array computations
https://www.acceleratehs.org
Other
893 stars 118 forks source link

Sequential array operations #147

Open mchakravarty opened 10 years ago

mchakravarty commented 10 years ago

We have already got sfoldl, but we need more sequential array operators. In particular, we also need to be able to produce arrays in sequential code — e.g., we should have smap and szipwith. (I'm not convinced this is the best naming scheme, though.)

This leads to having Exp computations that are array-valued. The main conceptual problem with those is that we need to be able to statically track the size of these arrays (after all, an array-valued Exp result may be the result of a map). This may be possible with the type-level naturals that should be coming with GHC 7.8. They are not very powerful yet (reasoning-wise), but I think we can already do a lot with some very simple constraints. Our main cases seem to be that a function preserves the size or collapses an array to a singleton.

This might also be helpful for #142 as it would give us array stencils with static sizes.

kdmadej commented 10 years ago

I have a question. What are the chances for sequential versions of some functions like smap and szipWith to come to life any time soon? Recently I've came across a lot of situations where such functions would be a real life saver. It is quite often that I need to map over a vector / array (that is small in size thus doing something with it sequentially would not be time consuming) inside parallel computations. Any timetable on that matter? Can I be of any assistance with my still slightly limited understanding of CUDA and haskell? :)

wdanilo commented 10 years ago

Hello @mchakravarty ! :) I would also like to ask you about the sequential operators. They are right now crucial for us and we are hitting the wall, because it is impossible to code some very improtant algorithms without smap and szipWith. Could we ask you for the help please? We would be very thankful if such functionality could be added! :)

mchakravarty commented 10 years ago

@zgredzik @wdanilo Could you give us a small example of the type of computation that you would like to write that includes an smap or an szipWith?

kdmadej commented 10 years ago

For instance: for each pixel we would like to be able to calculate it's distance from a given Bézier curve. Part of the computations involves calculating a derivative of a Bernstein polynomial.

Given the polynomial as:

type BernsteinPoly a = Acc (Vector a)

We do the calculations with something like:

zeroPoly :: (Elt a, IsNum a) => BernsteinPoly a
zeroPoly = flatten $ unit 0

bernsteinDeriv :: (Elt a, IsNum a) => BernsteinPoly a -> BernsteinPoly a
bernsteinDeriv poly = acond (degree <=* 0) zeroPoly
                    $ A.map (* A.fromIntegral degree) $ A.zipWith (-) (A.tail poly) poly
                    -- or in this scenario: A.smap, A.szipWith
                    where degree = size poly - 1 :: Exp Int

Basically I'm trying to port the cubicbezier package so that it can work with accelerate (at least partially). Theoretically I could prepare separate data types for polynomials of different degrees (or work my way around it with tuples) and then handle them separately but I find it to be an ugly workaround.

mchakravarty commented 10 years ago

In this code there is no nesting. I assume that the size of BernsteinPoly will be fairly small and that is why you don't want this code to execute in parallel. Is that correct? (What size are the BernsteinPoly vectors usually?)

In what context do you want to call bernsteinDeriv? I assume that you need to call this function very often and that you want these multiple uses of bernsteinDeriv to execute in parallel. If that is true, then you will probably want to nest this array operation (even if it is sequential) into some other (parallel) array computations. Can you sketch how that would look like? Would it be in a parallel map? (This is important to know, because Accelerate currently doesn't allow this sort of nesting; although, @robeverest is working on implementing it.)

(Sorry for all these questions, but I don't know much about how Bézier curves are calculated.)

kdmadej commented 10 years ago

I'm soryy I should've explicitly stated, that this code is supposed to be run inside a generate function, that as far as I know is run in parallel. My bad, sorry :)

The thing we are trying to do is generate an image representing distances from a given Bézier curve for each pixel of the image.

There is no analytical solution for finding the distance of a given point from a cubic Bézier curve, only numerical ones. One of those numerical solutions requires converting the Bézier curve, represented with four control points, to a Bernstein polynomial form and then finding the roots of such polynomial (one of the steps of this solution includes calling bernsteinDeriv).

Unfortunately I've just realized that part of the numerical algorithm is recursive, and it is impossible to implement recursive functions using Accelerate, isn't it?

mchakravarty commented 10 years ago

Recursion in Accelerate always goes through the CPU. This is because GPUs don't like recursion at all. Can you express the recursion as an iteration instead?

Is it the function bernsteinDeriv that you want to use recursively until degree <=* 0? If so, that could be done iteratively.

From what you are saying, is it correct to say that for every point in the image (each invocation of the argument function to generate), you want to run the recursion/iteration of bernsteinDeriv once?

What I still don't understand, how is the (x,y)-coordinate of the current pixel of the image used in that computation? The Bernstein polynominal appears to be constant (across the whole image) from your description.

kdmadej commented 10 years ago

Crap I suck at explaining and I made it sound more confusing that it's supposed to be ^^

But to answer you question - I use the x and y coordinates to calculate the distance of the current pixel from the given cubic Bézier. In order to construct the final Bernstein polynomial for which I try to find the roots, I need both the input Bézier and the coordinates.

The best thing for me to do would probably be to link you the code I'd like to port, so here it is: the closest function and the package used for handling the Bernsteins

Hope this clarifies my intentions.

PS. Come to think of it, since I'm dealing with polynomials of a degree not higher than 5 (not too many coefficients), I can solve the problem of sequential mapping/zipping by providing functions for polynomials of each single degree I'm supposed to handle. The bigger problem would be: what to do about the recursive algorithm that stops at a given acceptable error threshold, but that's not the point of this discussion ;) (I'll probably post a question regarding my idea for a workaround in a separate thread)

mchakravarty commented 10 years ago

This one is tricky. The recursion function you mentioned is bezierFindRoot in the code you linked to, isn't it. That recursive function returns a list (the roots) whose length is only determined at runtime. If you want to model this with arrays, you gets nested arrays of arrays where the length of the subarrays varies (dependent on the input data). Implement this on a GPU, requires proper nested data parallelism. We would like to support that, but a general strategy for efficiently implementing this on a GPU is an open research problem.