automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.64k stars 1.28k forks source link

Contributing more "building blocks" #116

Open FuriouslyCurious opened 8 years ago

FuriouslyCurious commented 8 years ago

Hi AutoML team, I can chip in some code and add more algorithm "building blocks" to autosklearn. For example, I can add Factorization Machines for classification problems.

What is a good place to start with this? Any code-traditions that you would like me to follow?

mfeurer commented 8 years ago

Hi, thanks for your interest in our project. While I think that factorization machines would be a helpful model, they should actually live in the scikit-learn package. We try to not implement machine learning algorithms ourselves, and are actually working on removing all custom implementations from the auto-sklearn code again. Having said this, we're open to the contribution of models from scikit-learn.

If you instead want to use auto-sklearn with factorization machines, this page of the documentation will guide you through the process of doing so.

rhiever commented 8 years ago

IMO it's not a bad idea to implement packages outside of scikit-learn as long as they fit the scikit-learn interface (init, fit, predict/transform, etc.). Arguably, to push the boundaries of AutoML, we will need to implement advanced pipeline operators that aren't supported in scikit-learn.

caiotaniguchi commented 8 years ago

@rhiever, that is already the case. XGBoost is part of the stack and it's not embedded in scikit-learn, although it does have scikit-learn wrappers.

mfeurer commented 8 years ago

Yes, XGBoost is in there.

The reason why I'm conservative about this is because I'm not sure how to easily maintain the dependency on additional packages. This might be an issue with the unit tests in auto-sklearn, but this is what holds me back from doing this.