Shark-ML / Shark

The Shark Machine Leaning Library. See more:
http://shark-ml.github.io/Shark/
GNU Lesser General Public License v3.0
503 stars 131 forks source link

Create a user friendly meta-toolbox #140

Open Ulfgard opened 7 years ago

Ulfgard commented 7 years ago

We should make shark more user friendly by supporting interfaces more similar to what sklearn is doing. Many users just do not have to background to reasonably use shark, e.g. pick which algorithm to use for training.

I would favour an usage interface like:

shark::RegressionDataset data, test;
shark::box::LinearRegression model;
model.train(data);
//or
model.trainCVGridSearch(data,numberOfFolds,SomeWayToSpecifyGrid);
Data<RealVector> predictions = model(test);
double test_error = error(regression,test);//model knows which error measure should be used, e.g. 2-norm for regression and 0-1 for classification

We can set the bar high for this, e.g. everything should work out of the box without any problems. Or we can also decide to stay as close as possible to sklearn. We also do not need to support everything as long as interoperability is given (e.g. everything in box should offer an interface to the internal AbstractModel)

TGlas commented 7 years ago

If I understand correctly, then what you propose is a simplified API for those who "simply want to use" the library, without ever digging deeper. That would be great to have, indeed!

Taking the sklearn approch, that amounts to providing fit and predict methods (for supervised learning), plus a few bells and whistles. This again implies mixing models and optimizers, which means that we need an entirely new abstraction layer on top of shark. Is that what you have in mind?

We could even restrict the Python export to that API, if it simplifies matters. And we can even design the core of this interface in plain C, so language bindings become trivial. Maybe that's a good motivation for getting this done.

JosefAssad commented 6 years ago

+1 for python 3 bindings. We're evaluating shipping Shark in a ML platform, but we're encouraging users to stick to Python 3 and falling back to R in general. Very few of our clients are going to be writing C++ models.

I also like the idea of stealing sklearn semantics. That is, mirroring the "fit transform predict" mantra which is pervasive there. This is actually spreading, tensorflow has an interface which mirrors sklearn's mantra also.

Ulfgard commented 6 years ago

By now, everything that is a simple application of model+trainer can be automatically wrapped by a class offering fit+predict. This is because the trainer offer all relevant type information. I think we do not need a transform, because shark does not offer any "inverse-transform" capabilities, so there is no difference between predict and transform.

This would cover SVMs, all linear methods, etc. It does not cover neural networks of any kind. Also any advanced methodology like cross-validation is not covered yet(beyond a naive implementation).

could be some volunteer project writing such an automatic wrapper and writing automatic wrapper code from that to python.