automl / fanova

Functional ANOVA
122 stars 45 forks source link

RuntimeError - ordinal or categorical features cannot be processed #106

Open deslay1 opened 3 years ago

deslay1 commented 3 years ago

Hi!

Great library that also looks very friendly to use. I tried using data that is ordinal and passed in the X features as a pandas dataframe. I got an error that led me to deep dive into the source code of both this library and its dependency pyrfr but its hard to understand the problem. The error I get is:

  File "/home/os5222el/anaconda3/envs/fanova/lib/python3.7/site-packages/pyrfr/regression.py", line 1035, in add_data_point
    return _regression.default_data_container_add_data_point(self, *args)
RuntimeError: Feature 0 is categorical with values in {0,...,18}, but datapoint has value 4096 which is inconsistent!

where Feature 0 (I used a print statement in source code to validate this) is:

[4096.0, 0.0, 0.0, 4.0, 20.0, 36.0, -1.0, -1.0, 10.0, 2.0, 1.0, 67108864.0]

Which corresponds to my first row of data of course. A snapshot of the first 10 rows of my dataframe:

data

The error from the add_data_point can be traced back to the source code in the pyrfr library with lower level code here: https://github.com/automl/random_forest_run/blob/master/include/rfr/data_containers/default_data_container.hpp#L84

It seems that it uses the get_type_of_feature to return an index, but it seems strange that would be compared to an actual value (in my case 4096). There are 19 different values in the first feature (0->18).

Any ideas for why this is happening? Appreciate any responses!

BugsBuggy commented 2 years ago

After validating that the ConfigSpace is correctly mapped to the corresponding DataFrame columns and making sure that all categorial variables are encoded correctly I got the same error as you for a more complex use case including conditional hyperparameters.

In my case, after getting the same error with OrdinalHyperparameter I defined the model dimension as a categorical variable and mapped the values to a unique id {256: 9, 512: 10, 128: 11}. 9 is therefore a valid choice but somehow only values from {0...3} are accepted.

Bildschirmfoto 2021-12-02 um 13 21 55