automl / ConfigSpace

Domain specific language for configuration spaces in Python. Useful for hyperparameter optimization and algorithm configuration.
https://automl.github.io/ConfigSpace/
Other
203 stars 93 forks source link

Nested Conditions result in incorrect deactivation #253

Open dengdifan opened 2 years ago

dengdifan commented 2 years ago
from ConfigSpace.configuration_space import ConfigurationSpace, Configuration
from ConfigSpace.hyperparameters import CategoricalHyperparameter
from ConfigSpace.conditions import EqualsCondition, AndConjunction, OrConjunction

from ConfigSpace.util import get_one_exchange_neighbourhood
import numpy as np

rng = np.random.RandomState(1)
cs = ConfigurationSpace()
x_top = CategoricalHyperparameter("x_top", [0, 1, 2, 3])

x_m0 = CategoricalHyperparameter("x_m0", [0, 1])
x_m1 = CategoricalHyperparameter("x_m1", [0, 1])
x_m2 = CategoricalHyperparameter("x_m2", [0, 1])

y = CategoricalHyperparameter("y", [0, 1])
x_b = CategoricalHyperparameter("x_b", [0, 1])

cm0 = EqualsCondition(x_m0, x_top, 0)
cm1 = EqualsCondition(x_m1, x_top, 1)
cm2 = EqualsCondition(x_m2, x_top, 2)

cb0 = EqualsCondition(x_b, x_top, 0)
cb1 = EqualsCondition(x_b, x_m1, 0)
cb2 = EqualsCondition(x_b, x_m2, 0)

cor = OrConjunction(cb0, cb1, cb2)
cand = AndConjunction(
    cor,
    EqualsCondition(x_b, y, 0)
)

cs.add_hyperparameters([x_top, x_m0, x_m1, x_b, x_m2, y])

cs.add_conditions([cm0, cm1, cm2])

cs.add_condition(cand)

cfg = {"y": 0,
       "x_top": 3,
       "x_b": 0,
       }
cfg = Configuration(cs, values=cfg)

When a variable is conditioned by a nested condition (combined with AndConjunction and OrConjunction) while all its parents in the AndConjunction are inactivated, the variable will be incorrectly deactivated even if OrConjunction is satisfied

jbussemaker commented 1 year ago

I'm running into the same issue, however already for a bit simpler (I think) configuration space:

The second condition (for activation of c) can be implemented in two ways:

  1. Using an EqualsCondition on b == C
  2. Using an AndConjuction combining the above with the condition a == A

To me, the first seems more intuitive, however:

  1. The first approach does not work with get_active_hyperparameters: c is incorrectly reported as active if b is inactive
  2. The second approach does not work with generate_grid: it fails to evaluate the AndConjuction when b is not active

Code to reproduce (v0.6.1):

import itertools
import numpy as np
from ConfigSpace import ConfigurationSpace, Configuration, Categorical, AndConjunction, EqualsCondition
from ConfigSpace.util import generate_grid

def check_sampling(cs):
    seen = set()
    for config in cs.sample_configuration(100):
        config.is_valid_configuration()
        x_seen = config.get_array().copy()
        x_seen[np.isnan(x_seen)] = -1  # Otherwise the set doesn't work correctly
        seen.add(tuple(x_seen))
    assert len(seen) == 4

def check_active_params(cs):
    assert cs.get_hyperparameter_names() == ['a', 'b', 'c']
    for x in itertools.product([0, 1], [0, 1], [0, 1]):
        x_active = cs.get_active_hyperparameters(Configuration(cs, vector=np.array(x), allow_inactive_with_values=True))
        x_active_should_be = {'a'} if x[0] == 1 else ({'a', 'b'} if x[1] == 1 else {'a', 'b', 'c'})
        try:
            assert x_active == x_active_should_be
        except AssertionError:
            print(f'{x} ({cs.name}): x_active = {x_active}, whereas it should be {x_active_should_be}')

def check_generate_grid(cs):
    assert cs.get_hyperparameter_names() == ['a', 'b', 'c']
    try:
        configs = generate_grid(cs)
    except ValueError as e:
        print(f'Encountered ValueError when generating grid for {cs.name}: {e!s}')
        return
    assert len(configs) == 4
    for config in configs:
        config.is_valid_configuration()

# First way of specifying nested conditions (preferred way):
# Child conditions only include their immediate parents (assuming that if the
# parent is not active the condition fails, and therefore the child is also not active)
cs1 = ConfigurationSpace(name='cs1', space={
    'a': Categorical('a', ['A', 'B']),
    'b': Categorical('b', ['C', 'D']),
    'c': Categorical('c', ['E', 'F']),
})
cs1.add_conditions([
    EqualsCondition(cs1['b'], cs1['a'], 'A'),  # b is active if a == A
    EqualsCondition(cs1['c'], cs1['b'], 'C'),  # c is active if b == C (and b is active)
])
check_sampling(cs1)
check_active_params(cs1)  # Fails!
check_generate_grid(cs1)

# Second way of specifying nested conditions:
# Child conditions include all ancestors in their condition
cs2 = ConfigurationSpace(name='cs2', space={
    'a': Categorical('a', ['A', 'B']),
    'b': Categorical('b', ['C', 'D']),
    'c': Categorical('c', ['E', 'F']),
})
cs2.add_conditions([
    EqualsCondition(cs2['b'], cs2['a'], 'A'),  # b is active if a == A
    # c is active if b == C (and b is active)
    AndConjunction(EqualsCondition(cs2['c'], cs2['a'], 'A'), EqualsCondition(cs2['c'], cs2['b'], 'C')),
])
check_sampling(cs2)
check_active_params(cs2)
check_generate_grid(cs2)  # Fails!

Output:

(1, 0, 0) (cs1): x_active = {'a', 'c'}, whereas it should be {'a'}
(1, 0, 1) (cs1): x_active = {'a', 'c'}, whereas it should be {'a'}
Encountered ValueError when generating grid for cs2: Evaluate must be called with all instanstatiated parent hyperparameters in the conjunction; you are (at least) missing 'b'

PS: there's also a typo in "instanstatiated" in conditions.pyx

mfeurer commented 1 year ago

Thank you very much for reporting. @filipbartek do you still plan to pick up your work on #197 again?

filipbartek commented 1 year ago

I currently estimate that I will not continue my work on #197. If I continue, I will probably do that before the end of October 2023.

eddiebergman commented 6 months ago

This problem still exists in #346. Will add it as a TODO there