automl / auto-sklearn

Automated Machine Learning with scikit-learn
https://automl.github.io/auto-sklearn
BSD 3-Clause "New" or "Revised" License
7.53k stars 1.27k forks source link

Evaluating portfolios outside of AutoML context #1671

Open WmWessels opened 1 year ago

WmWessels commented 1 year ago

Hi!

I am currently conducting research on a topic related to Autosklearn 2. For one of my experiments, I would like to evaluate the performance of your portfolios, outside of the context of the Autosklearn system. As such, I was wondering whether there is something like a utility function to transform the format of the portfolios as stored in this repository, to executable scikit-learn pipelines.

Thanks in advance.

PGijsbers commented 1 year ago

@mfeurer helped me answer this: portfolios are stored here, there are different portfolios for different internal optimization procedures. Configurations can be trained with this function

WmWessels commented 1 year ago

I have tried to get this to work using the answer provided by @PGijsbers. However, after running a configuration on a data set, I would like to retrieve the cross validation results, which is not possible after using the fit_pipeline function, as the automl object does not have a runhistory_ attribute. The code below reproduces this problem.

"portfolio.json" is the portfolio stored in autosklearn/experimental/roc_auc/askl2_portfolios/RF_None_10CV_iterative_es_if.json

import json

from autosklearn.classification import AutoSklearnClassifier
from autosklearn.metrics import roc_auc

from ConfigSpace.configuration_space import Configuration

with open("portfolio.json", "r") as file:
    portfolio = json.load(file)

port = portfolio["portfolio"]

import sklearn.datasets

X, y = sklearn.datasets.load_breast_cancer(return_X_y=True)

automl = AutoSklearnClassifier(time_left_for_this_task=3600, per_run_time_limit=900, metric = roc_auc, delete_tmp_folder_after_terminate=False, 
resampling_strategy='cv', resampling_strategy_arguments={'folds': 10})

config_space = automl.get_configuration_space(X, y)
hp_names = config_space.get_hyperparameter_names()

initial_configurations = []
for member in port.values():
    _member = {key: member[key] for key in member if key in hp_names}
    initial_configurations.append(
        Configuration(configuration_space=config_space, values=_member)
    )

fitted_pipe, info, value = automl.fit_pipeline(
    X = X,
    y = y,
    config = initial_configurations[0]
)

print(automl.cv_results_)
print("Run info: ", info)
print("Value: ", value)
print("Model: ", fitted_pipe)