automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.53k stars 1.27k forks source link

Is that possible set the initial value of hyperparams when use auto sklearn to search #577

Open zhangjunli177 opened 5 years ago

zhangjunli177 commented 5 years ago

For example, I have a XGB model, the prediction is not too bad. I plan to use it as a baseline model, and use it's hyperparams as the initial value for the auto sklearn with restricting search on XGB. Is that possible? There's a parameter called initial_configurations_via_metalearning, but that looks like not for my purpose.

mfeurer commented 5 years ago

Unfortunately, it is not possible to pass default values for a special classifier and we do not plan to add it ourselves in the near future. You can only restrict the searchspace to XGB and let Bayesian optimization do the job.

A hacky workaround would be to subclass the XGB class in autosklearn.pipeline.components.classification.xgradient_boosting, change the default values of the hyperparameters and then use only this new classifier.

github-actions[bot] commented 3 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs for the next 7 days. Thank you for your contributions.

rabsr commented 2 years ago

@mfeurer I can work on this which will allow to pass default values as well as override range and choices for hyperparameters without the need of subclassing new classifier or overriding package. Let me know if this is still relevant to auto-sklearn scope, I can start with sharing the design and implementation details.

mfeurer commented 2 years ago

Thanks for picking up on this @rabsr. Let's attack this in two stages:

  1. default values
  2. better way to pass in ranges and hyperparameters

Thinking about this again, for passing in the default values, we could create an example similar to the random search one and the successive halving one. Instead of replacing the SMAC object by the ROAR object in the 1st example and instead of changing the arguments to the SMAC object in the 2nd example, we would pre-pend the suggested new hyperparameter settings to the metalearning_configurations that are passed in and forward them to SMAC. What do you think of this? I'd be also happy to hear your suggestions on this.

rabsr commented 2 years ago

I was majorly focused on overriding values for hyperparams for algos.. But I can also start by adding an example as suggested for setting up configurations for initial baseline model.

I am considering following approach to pass ranges and defaults for hyperparameters:

override_params = {
    'classifier': {
        'sgd': {
            'loss': {
                'choices': ['hinge', 'log'],
                'default': 'hinge'
            },
            'penalty': {
                'choices': ['l1'],
            },
            'alpha': {
                'min': 0.005,
                'max': 0.1,
                'default': 0.01
            },
            'l1_ratio': {
                'min': 0.01,
                'max': 0.01
            }
        },
        'random_forest': {...},
    },
    'feature_preprocessor': {
        'pca': {...},
        'polynomial': {...}
    }
}
automl = autosklearn.classification.AutoSklearnClassifier(
    override_hyperparams=override_params
)

All the validations for ranges, hyperparam type, changing any categoical/numerical hyperparam to Constant/Unparameterized depending on input, conditional and forbidden clauses, all can be handled internally. Let me know of your thoughts on this.