Closed jmren168 closed 2 years ago
Hi, For beginning, I'm just a student assistant working on this project as well, so don't take this answer as final.
But as i see it. It is like this.
initial_configurations_via_metalearning: int = 25
is an integer value and sets the prior of hyperparameter optimization. Later in the code we than choose n values out off all possible metalearning configurations.
def suggest_via_metalearning(
meta_base, dataset_name, metric, task, sparse, num_initial_configurations, logger
):
if task == MULTILABEL_CLASSIFICATION:
task = MULTICLASS_CLASSIFICATION
task = TASK_TYPES_TO_STRING[task]
logger.info(task)
start = time.time()
ml = MetaLearningOptimizer(
dataset_name=dataset_name,
configuration_space=meta_base.configuration_space,
meta_base=meta_base,
distance="l1",
seed=1,
logger=logger,
)
logger.info("Reading meta-data took %5.2f seconds", time.time() - start)
runs = ml.metalearning_suggest_all(exclude_double_configurations=True)
return runs[:num_initial_configurations]
This configurations are used as metalearning_configurations
in smac (https://automl.github.io/SMAC3/master/). As i now smac is not limiting the given search space. So i think it is like your case 2.
autosklearn/smbo.py:526
smac_args = {
"scenario_dict": scenario_dict,
"seed": seed,
"ta": ta,
"ta_kwargs": ta_kwargs,
"metalearning_configurations": metalearning_configurations,
"n_jobs": self.n_jobs,
"dask_client": self.dask_client,
}
Hi @Louquinze ,
Thank you for the comments.
After tracing the code of metalearning_suggest_all , _learn and kBestSuggestions, it looks like meta-learning selects the K nearest datasets similar to the target dataset, and then returns the K best suggestions. If so, meta-learning does not work as the paper I mentioned in my previous post (using two SVMs and RF to select initial configs).
Also, meta-learning returns runs[:num_initial_configurations] and each element in runs is composed of (dataset_name, distance, best_configuration). Here best_configuration is found from runs according to the distance. But
Any comments are appreciated.
metalearning/kNearestDatasets/kND.py 51-62
# for each dataset, sort the runs according to their result
best_configuration_per_dataset = {}
for dataset_name in runs:
if not np.isfinite(runs[dataset_name]).any():
best_configuration_per_dataset[dataset_name] = None
else:
configuration_idx = runs[dataset_name].index[
np.nanargmin(runs[dataset_name].values)
]
best_configuration_per_dataset[dataset_name] = configuration_idx
self.best_configuration_per_dataset = best_configuration_per_dataset
metalearn_optimizer/metalearner.py 88
def _learn(self, exclude_double_configurations=True):
dataset_metafeatures, all_other_metafeatures = self._split_metafeature_array()
# Remove metafeatures which could not be calculated for the target
# dataset
keep = []
for idx in dataset_metafeatures.index:
if np.isfinite(dataset_metafeatures.loc[idx]):
keep.append(idx)
dataset_metafeatures = dataset_metafeatures.loc[keep]
all_other_metafeatures = all_other_metafeatures.loc[:, keep]
# Do mean imputation of all other metafeatures
all_other_metafeatures = all_other_metafeatures.fillna(
all_other_metafeatures.mean()
)
if self.kND is None:
# In case that we learn our distance function, get_value the parameters for
# the random forest
if self.distance_kwargs:
rf_params = ast.literal_eval(self.distance_kwargs)
else:
rf_params = None
# To keep the distance the same in every iteration, we create a new
# random state
random_state = sklearn.utils.check_random_state(self.seed)
kND = KNearestDatasets(
metric=self.distance,
random_state=random_state,
logger=self.logger,
metric_params=rf_params,
)
runs = dict()
# TODO move this code to the metabase
for task_id in all_other_metafeatures.index:
try:
runs[task_id] = self.meta_base.get_runs(task_id)
except KeyError:
# TODO should I really except this?
self.logger.info("Could not find runs for instance %s" % task_id)
runs[task_id] = pd.Series([], name=task_id, dtype=np.float64)
runs = pd.DataFrame(runs)
kND.fit(all_other_metafeatures, runs)
self.kND = kND
return self.kND.kBestSuggestions(
dataset_metafeatures,
k=-1,
exclude_double_configurations=exclude_double_configurations,
)
metalearning/kNearestDatasets/kND.py 137
def kBestSuggestions(self, x, k=1, exclude_double_configurations=True):
assert type(x) == pd.Series
if k < -1 or k == 0:
raise ValueError("Number of neighbors k cannot be zero or negative.")
nearest_datasets, distances = self.kNearestDatasets(x, -1, return_distance=True)
kbest = []
added_configurations = set()
for dataset_name, distance in zip(nearest_datasets, distances):
best_configuration = self.best_configuration_per_dataset[dataset_name]
if best_configuration is None:
self.logger.info(
"Found no best configuration for instance %s" % dataset_name
)
continue
if exclude_double_configurations:
if best_configuration not in added_configurations:
added_configurations.add(best_configuration)
kbest.append((dataset_name, distance, best_configuration))
else:
kbest.append((dataset_name, distance, best_configuration))
if k != -1 and len(kbest) >= k:
break
if k == -1:
k = len(kbest)
return kbest[:k]
Hi @Louquinze
I think I made a mistake: I should read the original paper of Auto-Sklearn: Efficient and Robust Automated Machine Learning, AAAI, 2015
In this paper, the authors mentioned "We exploit this complimentary by selecting k configurations based on meta-learning and use their result to seed Bayesian optimization...
This procedure meets what we found: find the nearest dataset that is most similar to the target dataset D, then directly use the best configuration of the nearest dataset to seed BO.
Any comments are appreciated.
Hi @jmren168,
I think that it basically what the metalearning is doing. I will ask someone else to confirm this.
Hi @jmren168,
The meta learningn you originally reffered to is specific for Autosklearn 2 from what I know and you're correct that the original Auto-sklearn paper is the meta-learning that is used in general.
These seeded runs are essentially the first configurations to be tried for a given new dataset so that we start searching from somewhere "reasonable".
Best, Eddie
Hi @eddiebergman and @Louquinze ,
Thank you for the reply, and now I have no more questions. So close this question. Thanks again :)
Hi,
After reading "initializing Bayesian Hyperparameter Optimization via Meta-Learning", I have a question about how initial_configurations_via_metalearning works in auto-sklearn, and hope someone could give me some hits. Many thanks.
When I enable initial_configurations_via_meta_learning to train a dataset D_n+1, and auto-sklearn found that the most similar dataset was D_J with a specific configuration theta_D_J, how this result is used to drive auto-sklearn to select an initial configuration for dataset D_n+1? case 1. theta_D_J is directly used as an initial configuration for dataset D_n+1 case 2. use dataset D_J to search another configuration (say theta_D_J_new) without limiting models, preprocessors, ..., and then theta_D_J_new is used as initial configuration for dataset D_n+1
If the answer is case 1, but the above paper mentioned only 3 classifiers (a linear SVM, an RBF SVM, and RF) are used in meta-learning, and I found that the type of model for dataset D_n+1 is mlp. This does not make sense. Any comments are highly appreciated.
10 | 48 | 0.00 | mlp | 0.236979 | 7.935055 | 9 | 0.218424 | 0 | 1.659342e+09 | 1.659342e+09 | 0.0 | StatusType.SUCCESS | [] | [feature_agglomeration] | none | Initial design -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- 6 | 79 | 0.00 | mlp | 0.244792 | 53.374027 | 5 | 0.175456 | 0 | 1.659342e+09 | 1.659342e+09 | 0.0 | StatusType.SUCCESS | [] | [feature_agglomeration] | weighting | Initial design -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --