Closed ghost closed 3 years ago
That doesn't sound very nice, thanks for reporting. Would you mind sharing the code you used to reproduce this issue?
from emukit.core import CategoricalParameter, OneHotEncoding, OrdinalEncoding, ParameterSpace
from emukit.core.loop import RandomSampling, LoopState
cat_one = ["up", "down", "left", "right"]
cat_two = ["low", "medium", "high", "superhigh"]
space = [CategoricalParameter(name='one', encoding=OneHotEncoding(cat_one)),
CategoricalParameter(name='two', encoding=OrdinalEncoding(cat_two)) ]
space = ParameterSpace(space)
loop = LoopState([])
points = RandomSampling(space)
points.compute_next_points(loop)
# array([[1., 0., 0., 0., 4.]])
The next points are passed to the evaluate
method of the target function so I assume they should not be the encoded values, unless I am using it wrong? (since also ordinal values are not converted, so it's a problem with encodings in general)
btw,
# onehot with numpy
encodings = np.zeros(len(categories))
np.fill_diagonal(encodings, 1)
# ordinal..just np.arange(len(categories) (?)
Yes, the output isn't decoded. That's intentional decision, because Emukit does not do modelling, and expects model as an input. We have no control over the way X is being put into the model. Therefore, to be as unopinionated as possible we decided to avoid doing encoding/decoding as a part of emukit's pipeline. That's essentially the trade-off between convenience and confusing behavior (which we could really fall into trying to cater for all possible use cases).
calling
sample_uniform
on a onehot encoded category returns the encoded 1d array and not the category value, which breaks the spacesample_uniform