automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.54k stars 1.28k forks source link

Autosklearn using only one core when extended #1175

Open FelipeFuhr opened 3 years ago

FelipeFuhr commented 3 years ago

I'm trying to extend autosklearn estimators list with XGBoostClassifier. When XGBoostClassifier runs alone (as an estimator), it uses roughly half of the available cores (and appears to be working just fine). However, if I use the original autosklearn estimators + XGBoostClassifier, it ends up with only one core being used, the training takes forever and, when it finishes, looks like it ran only the dummy classifier. Any ideas why this problem might be happening?

eddiebergman commented 3 years ago

Hi @FelipeFuhr, sorry to get back to you so late.

Would you be able to provide the code you used to extend and fit the model so I can reproduce this on my end?

mfeurer commented 3 years ago

We use threadpoolctl to limit the number of cores an algorithm within Auto-sklearn can use: https://github.com/automl/auto-sklearn/blob/ec7ba12100bbc8d8ead4aed72dea8e21cabc67fe/autosklearn/evaluation/abstract_evaluator.py#L199 That means that Auto-sklearn assumes that each algorithm is executed on a single core; this makes it easier for novices to not run into oversubscription issues. But from what I get you'd like to configure that variable? Can you for now manually change it in the installed file?

mfeurer commented 2 years ago

Keeping this open so we document this properly.