AxeldeRomblay / MLBox

MLBox is a powerful Automated Machine Learning python library.
https://mlbox.readthedocs.io/en/latest/
Other
1.49k stars 274 forks source link

Questions re: documentation for optimise() method #86

Closed jimthompson5802 closed 4 years ago

jimthompson5802 commented 4 years ago

The optimise documentation page contains this passage:

IMPORTANT : Try to avoid dependent parameters and to set one feature 
selection strategy and one estimator strategy at a time.

best1 = opt.optimise(space1,df)

space2 = { 'nenumerical_strategy':{"search":"choice", "space":['median']}, 'cestrategy':{"search":"choice", "space":["label_encoding","random_projection", "entity_embedding"]}, 'fs_strategy' : {'search': 'choice', 'space':['rf_feature_importance']}, 'fsthreshold':{"search":"uniform", "space":[0.01,0.3]},
'eststrategy': {'search': 'choice', 'space': ['RandomForest']}, 'estlearning_rate': {'search': "uniform", 'space': [0.001,0.05]}, 'est
max_depth':{"search":"choice", "space":[3,5, 7, 9, 11, 13, 15]}, 'est__n_estimators': {'search':'choice', 'space':[50, 100, 150, 200, 400, 800, 1200]} }

best2 = opt.optimise(space2,df)

AxeldeRomblay commented 4 years ago

Hello @jimthompson5802, Sorry for the late answer, I'm quite busy these days... Yes you're right, you should avoid optimizing two (or more) different models (eg: RandomForest & LightGBM) that share common parameters (eg : max_depth). The optimizer could be lost when trying different values for the different models... (max_depth = 10 for a RandomForest is not equivalent for a LightGBM...). But once again, it will work but will be less accurate I guess...