automl / ConfigSpace

Domain specific language for configuration spaces in Python. Useful for hyperparameter optimization and algorithm configuration.
https://automl.github.io/ConfigSpace/
Other
202 stars 93 forks source link

[Feature Request] Support json read for Categorical with tuple entries. #357

Closed charlesjhill closed 2 months ago

charlesjhill commented 5 months ago
Version & copy-pastable script

> Version: > ```python > from importlib.metadata import version > import ConfigSpace > version("ConfigSpace") > # '0.7.1' > ConfigSpace.__version__ > # '0.6.1' > ``` > Copy-pastable script: > ```python > from ConfigSpace import ConfigurationSpace, Categorical > from ConfigSpace.read_and_write.json import read, write > > cs = ConfigurationSpace() > cs.add_hyperparameter(Categorical("dims", [(1e-1, 1e-2), 1e-2, (1e-2, 1e-4), 1e-4])) > json_repr = write(cs) > cs_2 = read(json_repr) # Crashes > ```

This works:

from ConfigSpace import ConfigurationSpace, Categorical

cs = ConfigurationSpace()
cs.add_hyperparameter(Categorical("dims", [(1e-1, 1e-2), 1e-2, (1e-2, 1e-4), 1e-4]))
cs.sample_configuration()
# Configuration(values={
#  'dims': 0.01,
# })

But a round-trip to JSON doesn't.

from ConfigSpace.read_and_write.json import read, write
json_repr = write(cs)
# {
#   "hyperparameters": [
#     {
#       "name": "dims",
#       "type": "categorical",
#       "choices": [
#         [
#           0.1,
#           0.01
#         ],
#         0.01,
#         [
#           0.01,
#           0.0001
#         ],
#         0.0001
#       ],
#       "default": [
#         0.1,
#         0.01
#       ],
#       "weights": null
#     }
#   ],
#   "conditions": [],
#   "forbiddens": [],
#   "python_module_version": "0.6.1",
#   "json_format_version": 0.4
# }
read(json_repr)  # <-- Crashes

The stacktrace is:

File ~/envs/test_env/lib/python3.11/site-packages/ConfigSpace/read_and_write/json.py:552, in _construct_hyperparameter(hyperparameter)
    551 if hp_type == "categorical":
--> 552     return CategoricalHyperparameter(
    553         name=name,
    554         choices=hyperparameter["choices"],
    565         default_value=hyperparameter["default"],
    566         weights=hyperparameter.get("weights"),
    567     )

File ~/envs/test_env/lib/python3.11/site-packages/ConfigSpace/hyperparameters/categorical.pyx:72, in ConfigSpace.hyperparameters.categorical.CategoricalHyperparameter.__init__()

File ~/envs/test_env/lib/python3.11/collections/__init__.py:599, in Counter.__init__(self, iterable, **kwds)
    588 '''Create a new, empty Counter object.  And if given, count elements
    589 from an input iterable.  Or, initialize the count from another mapping
    590 of elements to their counts.
   (...)
    596
    597 '''
    598 super().__init__()
--> 599 self.update(iterable, **kwds)

File ~/envs/test_env/lib/python3.11/collections/__init__.py:690, in Counter.update(self, iterable, **kwds)
    688             super().update(iterable)
    689     else:
--> 690         _count_elements(self, iterable)
    691 if kwds:
    692     self.update(kwds)

TypeError: unhashable type: 'list'

In short, the json write converts the tuples to lists, but this isn't restored on the read. The CategoricalHyperparameter constructor requires that choices be hashable, and lists are not. I can thing of a few possible solutions; none would be very intensive.

  1. Tell users to convert their "exotic" types to strings and to parse in their code.
  2. Expose the cls kwarg from json.{dumps,loads} in read_and_write.json.{read,write} so users can pass a custom JSONEncoder or JSONDecoder, respectively. This would also allow for serialization/deserialization of other types of interest.
  3. Modify ConfigSpace.read_and_write.json._construct_hyperparameter to convert the list-typed elements of hyperparameter[{"choices", "sequence"}] and hyperparameter["default"] to a tuple, if needed, for categorical and ordinal hyperparameters.

The third option is the hack I'm doing for my use-case, but the second seems more robust and forward-looking. I'm filing the issue because I don't like option 1, of course :)

I can make a PR if there's interest. Cheers~

eddiebergman commented 5 months ago

Hi there,

Yeah unfortunatly json doesn't keep tuple types in serialization. I'm kind of surprised that you could use tuples in a Categorical in the first place.

Some updates from #346 which might be relevant:

We hope to release this sometime next week :)

charlesjhill commented 5 months ago

Awesome, the ability to pass a decoder that targets a particular key especially will make the process less prone to side-effects compared to using a JSONDecoder which must decide which transformations to apply based on type of input alone.