ClimbsRocks / auto_ml

[UNMAINTAINED] Automated machine learning for analytics & production
http://auto-ml.readthedocs.io
MIT License
1.64k stars 310 forks source link

Choosing best model from multiple models #318

Closed samching closed 7 years ago

samching commented 7 years ago

A silly question: but it seems to me that the auto_ml trains only the relevant GradientBoosting model by default. In order for it to train and choose from multiple models, do we just specify the models that we want to train and choose from in the model_names param?

Thanks

From the API:

model_names (list of strings) – [default- relevant ‘GradientBoosting’] Which model(s) to try. Includes many scikit-learn models, deep learning with Keras/TensorFlow, and Microsoft’s LightGBM. Currently available options from scikit-learn are [‘ARDRegression’, ‘AdaBoostClassifier’, ‘AdaBoostRegressor’, ‘BayesianRidge’, ‘ElasticNet’, ‘ExtraTreesClassifier’, ‘ExtraTreesRegressor’, ‘GradientBoostingClassifier’, ‘GradientBoostingRegressor’, ‘Lasso’, ‘LassoLars’, ‘LinearRegression’, ‘LogisticRegression’, ‘MiniBatchKMeans’, ‘OrthogonalMatchingPursuit’, ‘PassiveAggressiveClassifier’, ‘PassiveAggressiveRegressor’, ‘Perceptron’, ‘RANSACRegressor’, ‘RandomForestClassifier’, ‘RandomForestRegressor’, ‘Ridge’, ‘RidgeClassifier’, ‘SGDClassifier’, ‘SGDRegressor’]. If you have installed XGBoost, LightGBM, or Keras, you can also include [‘DeepLearningClassifier’, ‘DeepLearningRegressor’, ‘LGBMClassifier’, ‘LGBMRegressor’, ‘XGBClassifier’, ‘XGBRegressor’]. By default we choose scikit-learn’s ‘GradientBoostingRegressor’ or ‘GradientBoostingClassifier’, or if XGBoost is installed, ‘XGBRegressor’ or ‘XGBClassifier’.

ClimbsRocks commented 7 years ago

yep, that's exactly what model_names is for! let me know if any of that is unclear.

so, if you want to compare a bunch of models, you can do something like this

ml_predictor.train(data, model_names=['LGBMRegressor', 'XGBRegressor', 'DeepLearningRegressor', 'LinearRegression', 'RandomForestRegressor'])

we'll then run cross-validation to choose the best model.

in general, auto_ml does a lot of this: minimizes the amount of setting-tweaking you have to do by default, but gives you most of the ability to customize yourself if you want to pass in a few more params. for instance, trainin_params={'learning_rate': 0.2, 'n_estimators': 500} will set those as parameters for the model being trained.

let me know if you have any other questions as you use this more! i'm going to close this issue, but please keep opening new ones with more questions!