AutoML-style benchmark - Githubissues

Currently has 28 parameters to be configured (many of which are only utilized conditional on others)

To Do:

[x] Use ColumnTransformer to decide on preprocessing per-column (feature)
- Each feature is marked as 'none', 'standardscaler', 'minmaxscaler' (+ 2x continuous bounds (?) ), 'normalizer', 'PCA'
- Note: Normalizer and PCA should use a single instance! Eg. for the ColumnTransformer each needs to have the indices marked with PCA or Normalizer, rather than creating a new one for each feature.
[x] Set a time limit on how long an evaluation is allowed to take.
- Some configurations take more time to train and test, in exchange for often a better score. This would make this less trivial.
- [x] Figure out a good limit (eg. default configuration x 2)
[x] Check whether the bounds for the variables are reasonable
- [x] Specifically: there are quite a few features in with a range of [0, ∞], they are currently set to [0, 10]. Should they utilize a lognormal distribution instead (for example?)
[x] Add to run_experiment.py
[x] Test whether it runs properly without exceptions
[x] Make sure XGBoost can properly decide on the class when using a binary classification metric
- Does it automatically apply one-vs-rest or one-vs-one? Or do we need to make use of the wrappers in sklearn.multiclass?
- Note: the latter would add more features again / make existing features have more values.

AlgTUDelft / ExpensiveOptimBenchmark