AxeldeRomblay / MLBox

MLBox is a powerful Automated Machine Learning python library.
https://mlbox.readthedocs.io/en/latest/
Other
1.49k stars 274 forks source link

Overriding string numerical conversion #54

Closed jithurjacob closed 6 years ago

jithurjacob commented 6 years ago

Hi,

For a data set I'm using the target and categorical variables are already label encoded to integers. MLBox is incorrectly identifying the categorical values & target as continuous. Thus MLBox is incorrectly converting a classification task into a regression task.

I tried setting the columns as string but still MLBox is converting to integer. Can I override this behavior?

dataset

AxeldeRomblay commented 6 years ago

Hello !

You're right : MLBox tries to cast "fake" categorical features with levels like "1", ... For the next release, the target won't be casted but the features will still be. To avoid this, unfortunately, you will need to append to each level a string like "level" : .apply(lambda x: "level"+str(x))

Hope it will help you !

jithurjacob commented 6 years ago

Thanks that helps