automl / amltk

A build-it-yourself AutoML Framework
https://automl.github.io/amltk/
BSD 3-Clause "New" or "Revised" License
62 stars 4 forks source link

[Bug] Hyperparameters with colons not passing to model #287

Open amirbalef opened 1 month ago

amirbalef commented 1 month ago

Describe the bug Hi, I am not sure if it is a bug or if I am not using AMLTK correctly. When using a colon (:) in a hyperparameter's name and creating a component, the configuration does not pass through the pipline to the model correctly. The cause of this issue is located at line 431 of node.py in the repository: node.py#L431. If there is a colon in the hyperparameter's name, the hyperparameter gets filtered out.

For example, the configuration space below shows the hyperparameter model:name which contains a colon:

from amltk.pipeline import Component
import ConfigSpace

delimiter = ":"  # Changing delimiter solves the problem!

class Model:
    def __init__(self,**config):
        print("Passed config:", config)
        self.config = config
    def fit(self, X, y):
        pass
    def predict(self, X):
        return int(self.config["model" + delimiter + "name"][1:])

hyp = ConfigSpace.api.types.categorical.Categorical(
    name="model" + delimiter + "name",
    items=["M1", "M2"],
)
space = ConfigSpace.configuration_space.ConfigurationSpace(name="main_sapce")
space.add(hyp)
pipeline = Component(Model, space=space)
print(pipeline.search_space("configspace"))
config = pipeline.search_space("configspace").get_default_configuration()
print("Suggested config:", config)
configured_pipeline = pipeline.configure(config)
sklearn_pipeline = configured_pipeline.build("sklearn")
sklearn_pipeline.fit(None, None)
print("Prediction of default_configuration", sklearn_pipeline.predict(None))

To Reproduce Steps to reproduce the behavior:

  1. Install the packages:
    pip install amltk[sklearn, configspace]
  2. Execute the script
  3. See error. The result will be:
    
    Configuration space object:
    Hyperparameters:
    Model:model:name, Type: Categorical, Choices: {M1, M2}, Default: M1

Suggested config: Configuration(values={ 'Model:model:name': 'M1', }) Passed config: {}



**Expected behavior**
It is expected that a warning or an assertion error occurs when hyperparameters contain a colon (:) in their name.

**Environment and Installation:**
 - OS:Ubuntu
 - Environment: Conda
 - Python version:3.11
 - AMLTK version:1.12.0
eddiebergman commented 1 month ago

Hi @amirbalef,

Yup, that's intended behavior. Due to the fact you can create arbitrarily nested components, internally I need some delimiter to be able to know where to pass down what config values. Typically, I would suggest that you don't need a delimiter as you can see we already prefix with Model:, i.e. the name of the Component.

If you want a different name, i.e. different prefix in the ConfigurationSpace, you could do the following:

hyp = ConfigSpace.api.types.categorical.Categorical(
    name="name",  # Remove delimiter
    items=["M1", "M2"],
)
space = ConfigSpace.configuration_space.ConfigurationSpace(name="main_sapce")
space.add(hyp)
pipeline = Component(Model, space=space, name="model")  # Add a name to the component

Which should give you this search space:

Configuration space object:
  Hyperparameters:
    model:name, Type: Categorical, Choices: {M1, M2}, Default: M1

Suggested config: Configuration(values={
  'model:name': 'M1',
})
Passed config: {}

Otherwise, if that doesn't fit your needs, I would recommend as you have rightly spotted, using a different delimiter. Using . is also a pretty good alternative.


Given there are a ton of work-arounds and it's intended behaviour, I'm going to close it as wontfix, unless there's some other reason where you really need your own : delimiter in there that is not derived from the Component.name

eddiebergman commented 1 month ago

Actually, I removed the wontfix label and the real solution to this issue is just to explicitly raise an error when this happens, detailing why and the alternatives. I'll keep this open then!