MLeap's value for sklearn

mingmasplace commented 5 years ago

MLeap solves the single-request low latency prediction problem for Spark pipeline. Quick test shows sklearn native pipeline.predict has pretty good latency < 3ms(sure it depends on the number of transforms). So why would people want to migrate the existing sklearn online prediction to MLeap? Thanks.

ancasarb commented 5 years ago

In our use case, we had to support model building/training not just in scikit-learn, but also in Spark and Tensorflow, so MLeap helped in this case, because at scoring time, you need to worry about monitoring and scalability of a single model scoring service. At the same time, with MLeap we expose a unified scoring interface, so clients which integrate with the scoring service don't need to know/worry about whether it's a Spark model they're using or scikit-learn etc. And this makes switching between models, with an A/B test for example, very easy.

Hope this helps!

mingmasplace commented 5 years ago

Thanks Anca. So MLeap not only solves the latency issue with Spark, but also provides an unified online scoring service that is model-building ML framework agnostic. Still it isn't clear why that is a problem.

TF provides lots of DNN algorithms. Are you going to implement those in MLeap to get rid of the online scoring dependency on TF? If not, The scoring service will still have dependency on TF (which could be in a container).
Many have been using sklearn or TF for both model building and model scoring. Feature transformation latency doesn't seem to an issue on those framework. Seems people are fine with production support of scoring service built on top of sklearn and TF separately. So what are the issues this approach might have?
To evaluate different models, we just need to define a common scoring interface and leave the actual implementation to the containers that implement the interface, so you can have sklearn container, TF container, etc. Why do we need to use common MLeap pipeline inside the containers?

combust / mleap

MLeap's value for sklearn #430