Closed 93lorenzo closed 1 month ago
Sounds reasonable to me.
I will check if it's doable and get back to you.
I am not sure I understand the suggestion or the problem. At the moment, we offer options about what to do with unseen categories. What's wrong with that functionality? What else would the transformer need to do?
The init parameters need to be immutable to be compatible with the sklearn API, so we can't really add functionality there, we need to accept them as the user enters them.
Anyhow, I am sure I am missing something here, so would be great if you guys could enlighten me :)
DecisionTreeEncoder does not expose "unseen" argument to the user and instead uses constant value "raise". Proposition is to expose it to the user.
First of all, thanks a lot for the comments! 🙏 I will try to explain more about why I raised this this issue.
I was using the transformer in a training pipeline. When there was an unseen category in one of the split, it was throwing the error related due to one of categories was unseen.
DecisionTreeEncoder
has by default 'raise'. Sorry, I get it know. I forgot that we do not expose that in the TreeEncoder.. By all means, we should certainly change that! If any of you wants to volunteer, that'd be a major help! Thank you!
Thanks!! I volunteer, I will do an PR this week
I started to use the DecisionTreeEncoder for categorical variables. Lately I ran into an exception and I found out it was about when in there is an unseen category. I understand this as a default behaviour, but I was thinking that having the flexibility to decide that in the constructor can be helpful.
To give more context: From the code I saw that the
OrdinalEncoder
has by default the unseen variable set to"raise"
Suggestion
I would like to suggest to make that param something more flexible having it in the constructor
Of course there are alternatives, in my project I am currently overriding the function to take into account that behaviour, but I think that this flexibility might help others . In case it is found a reasonable change I would be happy to help on making the change