Open jkleint opened 8 years ago
I can see these limitations, and we are already tackling some of them (custom loss function). We will in all cases keep the AutoSklearnClassifier
as an easy interface for the user, but the following should be possible:
This would have to wait until scikit-learn 0.18 is out since it will have a new interface for CV objects.
The issue with configuring the hyperparameter search space is a little bit harder and needs more thinking. My suggestion is that auto-sklearn provides subclasses of the Pipeline
and the FeatureUnion
which take control of assembling a valid configuration space, and placeholders which follow the scikit-learn interface like a classifier or a feature preprocessor or a data range changing building block. The building blocks would then be configured by selecting the right algorithm for them and choosing its hyperparameters. All in all, the user would have to pass a Pipeline object to some class in AutoSklearn which would then take over. Does this make sense to you?
Glad to hear you're working on it. That's great that sklearn is getting a new CV interface.
For search spaces, since it is an advanced option, I would personally not mind just making a copy of the default ConfigurationSpace
and tweaking it (and passing it into the CV interface); it doesn't seem worth coming up with something custom.
Thanks!
As it stands,
AutoSklearnClassifier
does double-duty as an estimator (fit
,predict
) and hyperparameter search. Usually the latter is done withGridSearchCV
or similar. The combined design is simpler for users, but has some limitations:Would it be practical to follow the BaseSearchCV interface, providing a custom (say)
AutoSearchCV
object which would take in aAutoSklearnClassifier
, params to search (asConfigurationSpace
, defaulting to all as currently), any custom CV folds, and hyperparameter seach config? This would make auto-sklearn even more awesome than it is now.