This repository contains all data and code required to produce the results in the submission with title "A Meta-Level Learning Algorithm for Sequential Hyper-Parameter Space Reduction in AutoML".
Note on datasets and analyses: The algorithm in the paper takes as input performance and execution times of past runs (i.e.,
ML_results_{classification,regression}.csv
). Providing all datasets and code to analyze them is out of scope.
All required data to produce the results for the paper are in data/data.zip
. A list of files along with
a description follows.
ML_results_{classification,regression}.csv
:
performance and execution time results of machine learning configurations on classification/regression datasets. These
were obtained by running JADBio on all datasets.metadata_{classification,regression}.csv
:
meta-features used to represent classification/regression datasets.datasets_{classification,regression}.csv
:
list of classification/regression datasets, along with some of their characteristics.
Used for convenience in plots.py
.dataset_sources.csv
:
list of all classification/regression datasets and their sources. The file contains the following information:
_v${VERSION}_did${DATASET_ID}
is appended to the dataset name, where VERSION
is the version of the dataset, and DATASET_ID
is its OpenML identifier.For the sake of convenience, all intermediate results produced by the scripts in this project are also provided in
results/results.zip
.
To increase the number of regression problems, classification problems were obtained from BioDataome and turned into regression problems as follows:
results/results.zip
.
were used as predictors.These datasets can be obtained by selecting all regression datasets from dataset_sources.csv
from
BioDataome.
Note on requirements.txt: The code has been tested on the package versions in requirements.txt and might not run with other versions. We recommend using virtual environments to install dependencies.
First, unzip data/data.zip
files and add them to the data
folder. Next, run the following scripts to produce
all results required for the plots:
{classification,regression}_threshold.py
: Produces all results for Figure 2 (SHSR with different thresholds).{classification,regression}_configuration_subsampling.py
: Produces all results for Figure 3 (SHSR on partial results).{classification,regression}_random_elimination.py
: Produces all results for Figure 4 (SHSR vs random elimination).All results are stored in the results
folder. Alternatively, this step can be skipped by unziping the results
results/results.zip
and adding them to the results
folder.
Run the plots.py
script to produce all plots of the paper. The plots are stored in the plots
folder.