Closed Vaishaal closed 8 years ago
cc @stephentu @rolloff
One thing i would like to try and get right here is interfaces. E.g. The sparseLinearKernel is of type (Array[Long], Array[Double])
when probably it should be of type SparseVector? I am not sure we are (ever) going to be running 2e9 by 2e9 kernel solves, so maybe this is OK? Also, is the right return value for, e.g. generateKernelTestBlock
actually RDD[DenseVector[Double]
or is it a RowPartitionedMatrix
? In the only use of that function in this pull request, the first thing we do is convert each partition to a DenseMatrix, which could probably be handled locally more efficiently.
@shivaram brings up a point about whether the KernelGenerator should be a Transformer
. It would be good to figure out how to fit this into the Estimator
/Transformer
framework. The fact that there's a separate method for Train/Test indicates to me that maybe we should have one method that takes Train and is an Estimator that when fit returns a Transformer
that can be applied to test. I haven't thought through the workflow of these things as closely as you guys, but hopefully that provides some guidance. Taking a pass over the rest of the code now.
Alright - I took a pass with a bunch of style/structure comments. Before you do a ton of coding it might make sense to just sketch out the interface changes you're thinking about and we can talk about them here?
Also, from a maintainability perspective, it would be really good to get a reference to a book chapter or paper that describes exactly the problem you're solving in the scaladoc and then be consistent with those variable names, etc. For example, I looked at 7.4 of Murphy's book to page this into my brain this AM. For the main algorithm - I guess the BCD kernel paper is on arXiv now? http://arxiv.org/pdf/1602.05310v1.pdf
Ok - for right now I'm just focused on the KernelMatrix
API/implementation because I think the other changes we talked about yesterday are still to come. Modulo my comment about zipWithIndex
and some style nits I am basically cool with this.
If this looks good I'll start plumbing the linear system stuff and then we should be done?
ping
Sorry for the delay - I'll get to this after May 20
sounds good!
On Mon, May 16, 2016 at 12:02 PM, Shivaram Venkataraman < notifications@github.com> wrote:
Sorry for the delay - I'll get to this after May 20
— You are receiving this because you were mentioned. Reply to this email directly or view it on GitHub https://github.com/amplab/keystone/pull/234#issuecomment-219515123
Alright - closing this since we got this functionality in #284.
@shivaram let me know if this looks correct. I'm storing the full model in memory and broadcasting, do you know of a more effective way to do this.