ARM-software / mango

Parallel Hyperparameter Tuning in Python
Apache License 2.0
335 stars 40 forks source link

Voting classifier #107

Closed VEZcoding closed 11 months ago

VEZcoding commented 1 year ago

Is it possible to use mango with Voting classifier? And tune parameters for each classifier that takes part in the ensamble.

sandeep-iitr commented 1 year ago

Yes, it should be very easy to do this task. One approach is where the voting classifier can use a parameter space for each classifier task like say [0 to 1], and based on the parameter values, the ensemble is created using all the classifiers. The accuracy/loss of the ensemble can help Mango to find optimal parameters for the voting classifier. I can help more if you can share more context into your problem.

VEZcoding commented 1 year ago

Hey!

I figured it out, you cant gridsearch estimators in voting classifiers and their parameters. It would fail for example only testing RF and XGB if you also parse LR parameters. But I am not sure why it takes so much more time than for example scikit-learn grid search in a dataframe 1000rows 300 columns. It takes 80 seconds/iteration. And not filling all jobs allocated

clf_rf = RandomForestClassifier()
clf_xgb = xgb.XGBClassifier()
clf_lr = LogisticRegression()
param_space = {'rf__n_estimators': [10, 100, 1000], 'xgb__n_estimators': [10, 100, 1000]}
clf = VotingClassifier(estimators=[('rf', clf_rf), ('xgb', clf_xgb), ('lr', clf_lr)])
@scheduler.parallel(n_jobs=10)
def objective(**params):
    global X_train, y_train, clf
    clf = clf.set_params(**params)
    score = cross_val_score(clf, X_train, y_train, scoring='accuracy').mean()
    return score

tuner = Tuner(param_space, objective, dict(num_iteration=4, initial_random=10))
results = tuner.maximize()
sandeep-iitr commented 11 months ago

Were you able to resolve this issue?