Closed David-Durst closed 6 years ago
By conv, you are referring to an actual convolution, correct?
I'd like to make sure we keep keep that different from the operation stencil (stencil can also be called shift), which just shuffles data around. Convolution is a stencil to create "pixel windows" followed by the math on the pixel windows.
Note that neither conv or stencil should be higher order on f, since stencil
For now, I'll write box instead of conv since I don't want to deal with the weights in a real convolution: box_t k = (map_t (fold_t k +)) $ stencil_t k
So stencil_t converts a stream of T's into a stream of T[k]'s.
(fold_t k +) is a function that takes a T[k] and outputs a T (the sum of all elements in the k-element array).
Of course, that just tells me how to process one of the elements of the stream. So I need to wrap it in a map to apply this function to all elements of the stream.
So given the output of stencil_t k ...
I believe we want to map a k-element reduction onto all elements. (map_t (fold_t k +))
So therefore I think the result is:
box_t k = (map_t (fold_t k +)) $ stencil_t k
where box_t k:: {1,T} --> {1,T}
So now let me think about the pixel-parallel version. Note I say pixel parallel since even though I am processing the pixels completely in parallel, I chose to perform the per-pixel reduction serially.
box_x k = (map_x (N-k) (fold_t k +)) $ stencil_x k
So the stencil_x takes the entire image (the entire image is processed in parallel), and creates a new dimension:
stencil_x k :: {1,T[N]} --> {1, T[N-k][k]}
Now we need to have a parallel map to process the N-k pixels.
box_x k = (map_x (N-k) (fold_t k +)) $ stencil_x k :: {1,T[N]} --> {1,T[N-k]}
One thing I'm noticing in spacetime.txt is that map_x takes the parameter N (the number of elements to map over, interpreted as the number of units stamped out in space), but map_t does not. I'm thinking we might need to make these symmetic. Although map_t processes one element for tick, it seems like the entire time complexity of the map should be the time for all elements in the stream, not one element. (Just like how the space complexity of map_x is the complexity of all elements, in the array, not just one element.)
Here's a way to think about it. Imagine you had a functional program with no _x's and _t's that operated on finite sized vectors as collections I think we want the ability to take any program in this basic functional language and convert (map f)'s on arrays of size N into either (map_x f N)'s or map_t's, as a scheduling choice. When using map_x, the input stays an array, and when using map_t the input is now a stream I suppose.
I'll think about this more later today.
I added Teguh.
Questions
For now, I'll write box instead of conv since I don't want to deal with the weights in a real convolution: box_t k = (map_t (fold_t k +)) $ stencil_t k"
Isn't box_t just higher order on f where f is +? In order for conv to do the math, it needs to be higher order on the function it's using to do the math, right?
Is there an application of this to work towards? Teguh's benchmarks? I think that an application would help concretize this.
For notation - I would like to be stickler, so box_t k:: int --> {1,T} --> {1,T}. Is that ok?
Comments I believe that we had come to the same implementations for box (what I used to call conv) with the exception that I hadn't made mine handle streams correctly. I've fixed that. I just wasn't confident in my understanding of rate inputs for this implementation.
I have an idea for incorporate lengths into streams. I think we made need to change the definition of stream to include length. From my experience, streams are usually infinite (or lazily computed to appear infinite). If you want to force length-awareness into streams, you need to change their type signature into {rate, num ticks to complete, type}, like map_t f n :: {1, n, S} -> {1, n, T}. Thoughts?
From the email, I think that the read and write functions solve some of my concerns. They make it clear the rate at which data is being fed and written out.
box shouldn't be higher order since it's not parameterized on the math itself.
Box's implementation uses higher order functions map (over pixels) and fold with (+), but box isn't higher order itself. It's just a function that applies a box filter to the input array/stream. It need no other input other than the size of the box filter (k) and the input stream (s)
box_t k s :: int -> (array -> array) box_t k = ((map_t (fold_t k +)) . stencil_t k)
I have a reasonable first draft of arbitrarily parallel box_a_x. See https://github.com/David-Durst/aetherling/blob/c1c6db2d5746ee3580e66778d2ba3a38a751636f/spacetime.txt#L96
Why do we need a proof for how to nest box? Isn't it obvious that you can nest box?
No longer relevant.
https://github.com/David-Durst/aetherling/blob/573ca688fc852227047a93c3003a61284795aeac/spacetime.txt#L56 https://github.com/David-Durst/aetherling/blob/573ca688fc852227047a93c3003a61284795aeac/spacetime.txt#L62
I gave conv_t and conv_x different type signatures in how they take their input arrays due to their different ways of taking in parallelism.
I don't think that conv_t should take the entire array in as input in one tick as that would require O(n) area to store it all while it is processed. Am I wrong?