amplab / keystone

Simplifying robust end-to-end machine learning on Apache Spark.
http://keystone-ml.org/
Apache License 2.0
470 stars 117 forks source link

Pipelines being re-estimated at eval time? #251

Closed etrain closed 8 years ago

etrain commented 8 years ago

I think @shivaram and I have seen this in a couple of situations, but it looks like estimators are being "re-estimated" in the master branch.

You can see situations where this happens in PR 1 of keystone-integration-tests. Currently we're returning pipeline objects from the the .run method of each pipeline application. When we go to apply these methods to fresh data to make predictions, it looks like we're re-fitting all the fit estimators.

It would be great if you could have a look at this @tomerk

etrain commented 8 years ago

Fixed by #252