PanyiDong / InsurAutoML

AutoML in Insurance project.
MIT License
5 stars 0 forks source link

HyperOpt hyperparameter space conflicts with ray.tune #1

Closed PanyiDong closed 2 years ago

PanyiDong commented 2 years ago

Problem

In general case, different methods may contain same hyperparameter (for kNN-style imputation methods, a hyerparameter k is critical). For ray.tune, good thing is different hyperparameters (from different methods with same names) will be automatically recognized and distinguished. However, for HyperOpt, the name of hypeparameter is identified by dictionary keys and also a unique hyperparameter name. So, when defining the default hyperparameter space, for example, imbalance_threshold in SimpleRandomOverSampling and imbalance_threshold from SimpleRandomUnderSampling can be distinguished as following:

{
        "balancing": "SimpleRandomOverSampling",
        "imbalance_threshold": hp.uniform(
            "SimpleRandomOverSampling_imbalance_threshold", 0.8, 1
        ),
},
{
        "balancing": "SimpleRandomUnderSampling",
        "imbalance_threshold": hp.uniform(
            "SimpleRandomUnderSampling_imbalance_threshold", 0.8, 1
        ),
},

However, for general purpose, I designed a hyperparameter space under ray.tune style which does not allow such naming structure, but defined as following:

{
        "balancing": "SimpleRandomOverSampling",
        "imbalance_threshold": tune.uniform(0.8, 1),
 },
{
        "balancing": "SimpleRandomUnderSampling",
        "imbalance_threshold": tune.uniform(0.8, 1),
},

So, when using Grid Search/Random Search, no error will raise since it's supported by ray.tune. However, to call search algorithm HyperOpt, the problem of duplicate label error will occur. For above case, both imbalance_threshold will be identified as balancing/imbalance_threshold and cause HyperOpt unable to properly read hyperparameter space.

Reproduction of the problem

Here, I provide a simple example to demonstrate how the problem can occur:

from ray import tune
from ray.tune.suggest.basic_variant import BasicVariantGenerator
from ray.tune.suggest.hyperopt import HyperOptSearch

space = [
    {
        "balancing": "SimpleRandomOverSampling",
        "imbalance_threshold": tune.uniform(0.8, 1),
    },
    {
        "balancing": "SimpleRandomUnderSampling",
        "imbalance_threshold": tune.uniform(0.8, 1),
    },
]

def eval(config):

    _config = config["balancing"]
    loss = _config["imbalance_threshold"]

    tune.report(loss=loss)

analysis1 = tune.run(
    eval,
    config={"balancing": tune.choice(space)},
    num_samples=5,
    mode="min",
    metric="loss",
    search_alg=BasicVariantGenerator(),
)

analysis2 = tune.run(
    eval,
    config={"balancing": tune.choice(space)},
    num_samples=5,
    mode="min",
    metric="loss",
    search_alg=HyperOptSearch(),
)

At analysis1, the search works smoothly and raise a DuplicateLabel balancing/imbalance_threshold error at analysis2.

Current Idea on Solution

Since the problem occurs when converting ray.tune space to hyperopt space, I think when defining the default hyperparameter space, the methods can be added in front of hyperparameter names. And when call the methods, we can remove these prefixes to use the actual hyperparameter names so the hyperparameters can be called properly.

I'm still working on the problem. For now, the GridSearch/RandomSearch option for search algorithm should be fine.

PanyiDong commented 2 years ago

actual commit should be 4f67da98c95b974ee9264b3bc3ed44b00fbba1e0

PanyiDong commented 2 years ago

Solution

For above case, the hyperparameter space is defined in the current version as :

{
        "balancing_1": "SimpleRandomOverSampling",
        "SimpleRandomOverSampling_imbalance_threshold": tune.uniform(0.8, 1),
 },
{
        "balancing_2": "SimpleRandomUnderSampling",
        "SimpleRandomUnderSampling_imbalance_threshold": tune.uniform(0.8, 1),
},

When config hyperparameter search space, the keys are all unique, which all can be distinguished by HyperOpt and in the training phase, the redundant prefix ("SimpleRandomUnderSampling_", etc.) and suffix ("_1", etc.) are removed for dict/arguments matching.