Closed Morgan-Sell closed 5 months ago
Hey @Morgan-Sell
Thanks for the suggestion.
It's a massive file the one you linked. Which lines of code are the relevant ones? the ones that are actually doing the encoding?
So that I can understand what this is about?
What is the idea? you use numerical variables to predict the categories of the categorical ones? get the residuals between what and what? I don't understand lol.
Does the book include a reference? where is this coming from? what's the logic of this encoding? when is it suitable? I'd probably need to read the book.
The code is from lines 230 to 264.
The example involves predicting housing prices in Seattle. In the example, the author encodes Zip Codes. I think the encoder is to be used with categorical variables with high cardinality (I think).
I don't know how much the book will help. The book unfortunately is brief in its description of the transformation. I thought it was a clever idea to transform a variable based on residuals. Residuals can possess significant insight.
Hey @Morgan-Sell
If the book you mention doesn't explain the encoding clearly and does not quote an additional reference, I am inclined to close this issue.
In short, it would be good to know how well accepted and how well grounded the encoding is, to make it part of feature-engine. Based on your previous reply, it sounds like it is not super clear.
If you agree, then pls close it :)
Is your feature request related to a problem? Please describe. On page 168 of "Practical Statistics for Data Scientists", the authors discuss grouping categorical variables using the residuals from a regression.
The code can be found here starting on line 230.
Describe the solution you'd like Proposed steps:
pd.cut
.Describe alternatives you've considered n/a
Additional context Will search for additional research.