Dot product optimization pass

jklontz commented 10 years ago

At the heart of many computer vision algorithms (subspace learning, deep learning, wavelets) is a dot product operation of an incoming image against a filter constructed offline. This idea is to introduce a suite of LLVM optimization passes that leverage the fact that the filter is known at compile time. Specifically:

Completely unroll the dot product loop based on the known filter size.
Substitute memory access instructions with the known constant values for the filter.
Eliminate instructions where the filter value is 0 (or perhaps near-zero)

Together these passes convert the code between a generic dense dot product and a hard-coded sparse dot-product.

As a stretch goal:

Other optimizations which become possible when the dot product can be approximated within a pre-specified margin of error.

This is a long-term research idea and a good paper alone. It is also an example of an interesting idea that becomes possible with a computer vision DSL.

jklontz commented 10 years ago

@mburge I think this is optimization we discussed yesterday.

jklontz commented 10 years ago

caotto: so basically you build a look-up-table for convolution with a filter at compile time? [1:26pm] caotto: jklontz: or do I not get the idea [1:27pm] jklontz: I would go further than a LUT [1:27pm] caotto: elaborate? [1:27pm] jklontz: and inline the LUT into the code [1:27pm] jklontz: for example [1:29pm] jklontz: in our current sheme say we learn a 2-dimensional filter, x, with values {1, 2} [1:29pm] jklontz: conventionally we would apply that filter to another vector y like: [1:30pm] jklontz: int response = 0; for (int i=0; i<x.len; i++) response += y[i] * x[i] [1:30pm] jklontz: however, since we know x at compile time now [1:30pm] jklontz: the compiler to simplify this to [1:30pm] jklontz: int response = y[0] * 1 + y[1] * 2; [1:31pm] jklontz: pretty powerful, no? [1:32pm] jklontz: this then opens the door to conventional algebraic simplification [1:32pm] caotto: so it's not a LUT based on input values, rather you are inlining the math for doing the convolution [1:32pm] jklontz: removing the 1, elminating any operations that are 0 [1:32pm] jklontz: yeah [1:32pm] caotto: I see, so it's not a LUT at all [1:33pm] jklontz: and the compiler can turn that dense multiplication into a sparse one automatically [1:33pm] jklontz: yeah I guess LUT is not the right word [1:33pm] caotto: Well that sounds pretty interesting. [1:34pm] jklontz: I think it could be a real game changer for deep learning which involves learning a lot of filters [1:35pm] caotto: so this is basically saying, we do the convolution in time domain, and just optimize the hell out of it [1:36pm] caotto: have you considered how this compares/would apply to fft based techniques [1:36pm] jklontz: I haven't [1:36pm] jklontz: but I think it's relevant to anything where one or both of the operands is known at training time [1:37pm] caotto: so for example I'm pretty sure OpenCV's filter engine will do fft, apply the filter in frequency domian, then inverse fft [1:37pm] caotto: I guess you could just do the fft at compile time? [1:37pm] jklontz: yup [1:37pm] caotto: and do somehting similar based on that [1:37pm] jklontz: precisely

jklontz commented 10 years ago

@bklare This is one of the optimizations that I think would be particularly cool for a deep learning neural net.

bklare-zz commented 10 years ago

This is a pretty cool idea. I am still getting up to speed on CNN's. Do you know if the filters are the same at each layer? That is, the first layer is likely some form of Gabor filters, which you could use your hard coded and sparse approach for. I am wondering if this is the same at the subsequent layers as well. If not, the benefit may be lessened.

jklontz commented 10 years ago

I think the filters are different at each layer, but they could be equally sparse?

JordanCheney commented 10 years ago

https://www.facebook.com/publications/546316888800776/

If we wanted to build a deep learning network this could be a pretty cool one :)

bklare-zz commented 10 years ago

@jklontz Yes, I think they could be equally sparse. I guess my concern is that the selected filters for the subsequent would be unknown until after training.

bklare-zz commented 10 years ago

@JordanCheney Yeah, that paper has gotten a lot of buzz. It was presented at CVPR a few weeks ago as well. It is worth studying if you haven't already done so.

JordanCheney commented 10 years ago

I was excited talking about this yesterday and I was just wondering if there were any thoughts about implementing convolution in likely yet? I think it could be done in the likely standard library as a function that takes a base matrix, a filter matrix and a stride value. A dot product can then be computed for slices of the base matrix (is slicing implemented in likely?) and the filter. The filter would "move" across the base matrix in accordance with the specified stride value. My only question is would implementing this in the standard library take advantage of all of the optimizations discussed above?

jklontz commented 10 years ago

I completely agree that it should be implemented in the standard library, and have been doing 47d06c0d9cc9701a8613b2e63199ec53c1d79335 as a precursor. The convolution function should only take the image and filter as arguments and be able to infer the rest. The main feature currently lacking for this is being able to specify where in the image you want to access.

jklontz commented 10 years ago

To follow up on this, the most immediate next step is to extend kernelArgument::evaluateOperator to support arguments. For example:

image = (read "../data/misc/lenna.tiff"); currently works
(image 0 128 64); doesn't work, should return the value at column = 0, channel = 128, row = 64

JordanCheney commented 10 years ago

are the column and channel arguments switched here?

JordanCheney commented 10 years ago

Quick question- kernelArgument::evaluateOperator takes a likely_ast as one argument. I thought the purpose of an ast was that it could parse an arbitrarily sized list of arguments, which is the desired outcome here. Is there a limitation here that I am not understanding?

jklontz commented 10 years ago

Yeah, I switched column and channel in the example!

You should first confirm that ast->is_list and then you can access the parameters as ast->atoms[0] through ast->atoms[ast->num_atoms-1]

Note that in the example above, ast->atoms[0] == "image" and ast->atoms[1] == 0 and ast->num_atoms == 4. This is because the ast is isomorphic with the source code s-expression: (image 0 128 64).

JordanCheney commented 10 years ago

Quick sanity and understanding check for me- My understanding right now is that likely functions as a stack of environments, each of which has certain operations registered in some type of LUT. The root of this stack has all of the likely_standard_library functions in it. First, is this the correct, and, if so, does creating a new likely_matrix automatically register it in the current environment's LUT so that it can be processed later on? This second point ties into this because "image" is a likely_matrix not an operation so I wanted to check if ast->atoms[0] can be looked up using the same lookup() function or if a different method must be used.

jklontz commented 10 years ago

The first point is correct, but as you've noticed, it's more of a "lookup linked list" than a "lookup table". The second point about being able to lookup an image will be true in a few days, as I still have a few more patches that need to land first. Right now if you look up image you will get back (read "../data/misc/lenna.tiff") unevaluated.

biometrics / likely

Dot product optimization pass #20