Kernel Generators and Solver

shivaram commented 8 years ago

This PR adds kernel generators, a ridge regression solver and a kernel block model to Keystone. It also includes a CIFAR pipeline that shows how it can be used.

At a high level the user-facing design is as follows:

Users can create a kernel generator object and pass in its params like gamma etc.
Users can pass in a kernel generator and training data to a solver which will give a kernel model that can be applied to train or test data.

Internally this works as follows:

KernelGenerators are fit during training to bind the trainData as one of the arguments to K(x, y). This step produces a KernelTransformer. We need this as the transformer has only one argument usually and it makes the train vs. test distinction clear.
KernelTransformers can be applied to an RDD to generate a KernelMatrix. This is a wrapper class lazily populates the kernel matrix and has a block-column API.
The KernelMatrix is used by a linear system solver that just solves Kx = Y. Right now this is a part of KernelRidgeRegression class but it can be pulled out.

This was originally developed with the help of @stephentu @rolloff and most of the code was ported from #234 written by @Vaishaal.

shivaram commented 8 years ago

Note that while the code compiles and the unit test passes, I still haven't run the CIFAR pipeline on a cluster. I'll do that later today / tomm but I wanted to get the design feedback going before that. I also plan to add a CIFAR augmented kernel pipeline, so we can match the numbers in http://arxiv.org/abs/1602.05310

Vaishaal commented 8 years ago

Are you going to add the unit tests from the previous PR?

shivaram commented 8 years ago

Ah sorry forgot to commit it - I just copied the unit test you wrote. It passes locally

shivaram commented 8 years ago

Also one more note is that the kernel classes are templatized so we can handle other inputs. For example the Yelp workload used a SparseVector and a linear kernel and we should be able to handle that directly now

Vaishaal commented 8 years ago

Couple high level things:

We should note somewhere all these exact kernel solvers solve the problem in the dual, whereas our approximations (the only one we have right now is RandomCosine), solve the problem in the primal.
At least for RBF we should be consistent with gamma, since a gaussian kernel with parameter gamma is approximated with random cosine 1/gamma (I may be missing a constant).

Would it be easy to fit Nystroem into this framework?

shivaram commented 8 years ago

@Vaishaal

Good point. I added a comment in the class documentation for KRR that we solve this in the dual.
The RBF definition we are using seems pretty standard from what I see elsewhere http://scikit-learn.org/stable/modules/metrics.html#rbf-kernel - Is Figure 1 from http://people.eecs.berkeley.edu/~brecht/papers/07.rah.rec.nips.pdf the best definition of the mapping ? We can make a note of this somewhere in the docs.
I think Nystrom solver should fit pretty well in this framework. It can still take in a KernelMatrix as the input and then just sample the necessary column blocks. The main problem is that we might need to write another model class that does the same sampling on the test blocks as well. I'll try to see if we can make this more general though

Vaishaal commented 8 years ago

@stephentu had some comments about this. I think its weird cause the Nystroem is solved in the primal IIRC? I might be wrong..

And yes that figure is the best mapping.

shivaram commented 8 years ago

Yeah Nystrom is solved in the primal. But the KernelMatrix, which is this lazy intermediate datastructure, should remain the same across primal or dual solvers.

Also ideally we should be able to use the same block linear mapper that we use for RandomCosines but since the kernel transformer is not a keystone transformer we need this other class -- and this is something we can try to generalize better.

shivaram commented 8 years ago

@etrain This is worth another look when you get a chance. From testing it on a single machine, I can get the same test error (around 20%) as the older code for CIFAR unaugmented. In terms of performance I've made a bunch of changes that brings the performance pretty close to what we had before.

However the kernel matrix API is now a bit more tricky to use (users need to call unpersist after they are done with a block). I think this is a reasonable trade-off given that these classes are internal to keystone and the user-facing API is still straightforward to use. But any ideas to improve the API are welcome.

etrain commented 8 years ago

This looks pretty good to me. I don't see any major changes I'd make. The state management stuff is annoying but manageable. We could think of a standard interface that lets things clean up after themselves. This KernelGenerator stuff is a weird corner case, though, and I don't know if this is the right place to start with it.

etrain commented 8 years ago

Alright - finally merging this! Thanks @shivaram @Vaishaal @stephentu @rolloff - this is a great new feature.

amplab / keystone

Kernel Generators and Solver #284