alex-pirozhenko / sklearn-pmml

A library that allows serialization of SciKit-Learn estimators into PMML
MIT License
70 stars 17 forks source link

PMML Import #15

Open dtsmith2001 opened 9 years ago

dtsmith2001 commented 9 years ago

Full persistence of models requires PMML import, particularly for my use case.

Every morning, I have a task that compiles features, loads the model, predicts the features, and sends off the predictions.

While this could be handled with a service by training the model and then maintaining the estimator in memory, a failure means retraining the model. Supplying the hyperparameters to the estimator won't work.

Full PMML import/export is necessary in my case, unless I can re-engineer the process. I'm not sure there's much to be gained by this, so it's a tough sell internally.

alex-pirozhenko commented 9 years ago

Just to clarify - does the training flow in your case use a different stack (anything different from sklearn)? Otherwise, you could simply serialize your pre-trained model with pickle for further use, it's much more convenient than PMML.

At the same time I agree that PMML->sklearn import would be a valuable feature to this project. We're open to PRs.

dtsmith2001 commented 9 years ago

I’m using joblib.load right now. It seems to work well, but I do know there are problems serializing models from python.

I would like to be able to train models in scikit-learn and then import them into Spark. That's my ultimate workflow. Mllib will not currently import PMML.

Everybody exports to PMML (including R), but few systems will import PMML. This defeats the entire purpose of the standard, in my opinion.

DarinJ commented 9 years ago

If you're just using spark to evaluate a model trained it sklearn, you could use the jpmml evaluator, I've used it with scalding and pig on several occasions. This would require using the scala or java spark API's though if I was using python I've probably just use pickle.

dtsmith2001 commented 9 years ago

Hmm, apparently my future problems will just go away. Thanks @DarinJ for the info.

dtsmith2001 commented 8 years ago

We have no plans to use Spark. So PMML import is back on the table for me.