automl / SMAC3

SMAC3: A Versatile Bayesian Optimization Package for Hyperparameter Optimization
https://automl.github.io/SMAC3/v2.1.0/
Other
1.07k stars 220 forks source link

Optimizing in a discrete configspace #1091

Open mallanos opened 8 months ago

mallanos commented 8 months ago

Description

I want to optimize a function that takes in 3 float parameters. However, not all combinations of the 3 parameters could exist. Is there a way to define the configspace as a pool of possible solutions, so smac samples configs as three-dimensional points from that pool?

Steps/Code to Reproduce

What I'm doing now is defining the config space in the regular way:

    def configspace(self) -> ConfigurationSpace:

        cs = ConfigurationSpace(name="myspace", seed=seed)
        x0 = Float("x0", (np.min(embedding), np.max(embedding)), default=-3)
        x1 = Float("x1", (np.min(embedding), np.max(embedding)), default=-4)
        x2 = Float("x2", (np.min(embedding), np.max(embedding)), default=5)

        cs.add_hyperparameters([x0, x1, x2])

        return cs

Then, I use the Ask-and-Tell interface to: 1) Ask for a config or point in the three-dimensional space 2) Find the closest existing point to the suggested point 3) Get the score or value associated with that point 4) Tell smac3 the resulting TrialValue and TrialInfo

for _ in range(search_iterations):
    info = smac.ask()
    assert info.seed is not None
    score, point = model.sample(info.config, ec=ec, seed=info.seed)
    value = TrialValue(cost=score, time=0.5)
    true_info = TrialInfo(config=Configuration(configuration_space=model.configspace,
            values={
                    'x0': float(point[0]),
                    'x1': float(point[1]),
                    'x2': float(point[2]),
                    }), seed=info.seed)
    smac.tell(true_info, value)

Expected Results

all_scores = [smac.runhistory.average_cost(config) for config in smac.runhistory.get_configs()] I would expect the length of all_scores to be equal to the number of search_iterations, and no 'nan' values

Actual Results

When I inspect the results by running: all_scores = [smac.runhistory.average_cost(config) for config in smac.runhistory.get_configs()] I get several 'nan' scores and the number of values samples is greater than the max number of evaluations (search_iterations)

Versions

smac version 2.0.2

Thanks!

alexandertornede commented 8 months ago

Hi @mallanos,

thanks for posting this!

The approach you performed has a conceptual problem from my perspective: There is no guarantee that the closest point (depending on the distance metric you use) actually has a comparable acquisition function value.

Moreover, without looking into this, I assume that the nan values arise from the fact that you do not provide a proper value for the configuration that you obtained by the ask call, which is why SMAC just fills it automatically with a nan value. I would need to look into this, to confirm this assumption, though.

Depending on the concrete constraints you want to apply to your search space, you can try to work with conditions (https://automl.github.io/ConfigSpace/main/api/conditions.html) and forbidden clauses (https://automl.github.io/ConfigSpace/main/api/forbidden_clauses.html). Just be aware that these are internally resolved by rejection sampling meaning that a large number of constraints or forbidden clauses can make the sampling of configurations slow.

Does that help?