Open benjamc opened 4 months ago
Short answer, yes, intended to give you the keys, use np.array(config.values())
is you want the unnormalized values in a numpy array.
Sorry for typos, phone typing...
np.array
specification relies on the ad-hoc __array__
protocol, implemented on things like pandas dataframes, torch tensors and others, to efficiently do array stuff.
However things like a python list
don't have this, or others things like a python list. I'm sure calling np.array([1,2,3])
does something smart to pull out the values 1,2,3 but users can also implement their own list-like (Sequence/MutableSequence
), in which the only thing you can do is iterate it. Might look something like this:
def array(x: Any) -> np.ndarray:
if hasattr(x, "__array__"):
# follow protocol
elif isinstance(x, (list, tuple, builtin-python-thing)):
# do some low level Cpython manipulation
elif isinstance(x, Sequence):
# user implement list like, can't really do better than this
x_data = x[:len(x)]
return array(x_data)
elif isinstance(x, Iterable):
x_data = [e for e in x]
return array(x_data)
else:
# ....
Now the main point, Configuration
is a Mapping
(dict-like) and so in this setup, it would match the Iterable
statement. Basically np.array
can't do anything smart with a Mapping
and so it defaults to using __iter__
on it. Basically the behaviour matches that of calling list()
on a dict
, which iterates throughs the keys
I would argue the main use case of a Configuration
is that it behaves more like a dict than a vector, and so making it act like a Sequence
doesn't make sense. Further, putting the unnormalized values into a numpy array can contain strings, floats, ints, and soon arbitrary values, i.e. doesn't make much sense for an array. Could argue about putting the normalized values in there but then that's really far from the common use case of a Configuration
.
Had some time and did this on my phone but could you check some stuff for me?
pd.Series
acts like a dict (kinda), i.e. heterogenous key-value pairs... But it's also a library that implements the __array__
protocol. What happens when you do np.array(pd.Series({"a":1, "b":2}))
?
If it gives you an array of [1, 2]
, I could be persuaded to look into the array protocol so what you posted works, otherwise if it's gives ["a", "b"]
or an error, I would stick to keeping the behaviour as it would normally be for a Mapping
, even in the case of there being some vectorized format available.
Thank you for the explanation, makes sense! Feel free to close the issue.
Running np.array(pd.Series({"a":1, "b":2}))
yields array([1, 2])
.
We've decided this will do as you expected at the top of this issue! Will get to it when we have time :)
Awesome!
When calling
np.array
onConfiguration
it returns names of HPs instead of values. Is that intended? ConfigSpace 0.7.1MWE:
Output: