amplab / keystone

Simplifying robust end-to-end machine learning on Apache Spark.
http://keystone-ml.org/
Apache License 2.0
470 stars 117 forks source link

Auto caching rule #224

Closed tomerk closed 8 years ago

tomerk commented 8 years ago

Added auto-caching rules w/ aggressive naive, and greedy profiling strategies @etrain @shivaram

etrain commented 8 years ago

At a high level this looks pretty good. I'll provide some more detailed comments on the code shortly.

A couple things I'm missing: 1) How do you think a user would actually use the autocaching rule? Can you add it as an Optimizer option or something? 2) In that same vein - it might be nice to actually test this on (say) the VOC pipeline and ensure that a) profiling/caching is reasonably fast and b) that the outcomes are correct. The bulk of stuff needed for that is in my dag-estimation branch.

etrain commented 8 years ago

Given that auto-caching adds profiling overheads that we don't understand fully, can we make it so that it is not a default optimization? It would be great to then add a flag somewhere that allows it to be turned on easily, so that users can find out if it helps their pipeline.

etrain commented 8 years ago

Alright - I have given this a pretty thorough read through, and I think it is basically correct, modulo the few minor issues I pointed out. Will give it a second read for style, but if you could address the two things I pointed out (both of them may be fine, I just wanted to double check) then I think we are good to merge this one.

tomerk commented 8 years ago

@etrain So do you want me to make a System flag -Dkeystoneml.autocache, or just remove auto-caching from the default optimizer and provide a separate optional optimizer that does auto-caching?

etrain commented 8 years ago

I think resisting system flags as long as possible is a good idea. The latter proposal is good!

On Sat, Mar 12, 2016 at 11:33 AM, Tomer Kaftan notifications@github.com wrote:

@etrain https://github.com/etrain So do you want me to make a System flag -Dkeystoneml.autocache, or just remove auto-caching from the default optimizer and provide a separate optional optimizer that does auto-caching?

— Reply to this email directly or view it on GitHub https://github.com/amplab/keystone/pull/224#issuecomment-195794996.

tomerk commented 8 years ago

Okay @etrain, done

etrain commented 8 years ago

Awesome work. Thanks @tomerk!