Open TremaMiguel opened 3 years ago
I've heard of this method before. How widely spread its use it (other than in kaggle)?
Things to consider:
if not, should we make this transformer part of feature engine? so far the aim has been to create features that are semi-interpretable.
Another thing to consider, I don't necessarily want the package lightgbm as a dependency. We would need to see if we can use sklearn somehow.
Thoughts welcome!
1. The idea I get from this method is to find relations or interactions between the features, a sample in each leaf would be represented by different characteristic of the variables. An each sample might be represented differently in each tree, so this help finding interactions between group of features.
2.
would users be able to understand what the importance of these features tells them based on the model?
this is a good point, for individual decision trees we can get an idea of why each sample was assigned to a leaf, but to get the trace for every tree would be hard to interpret.
I had no idea feature engine aimed to semi-interpretable methods, I see this more as an experimental feature available for the user to try it or not.
3.
I don't necessarily want the package lightgbm as a dependency
As far as I know only lightgbm implementation can return the predicted leaf. So, lightgbm could be an optional dependency to install, for example pip install feature-engine[extras]
or something like that.
Thanks you!
In its inception, Feature-engine was thought to include methods that you would actually use when creating models to use in real life. My experience, from finance and insurance, is that you need to be able to explain what the model is outputting, and the users o the models, for example the fraud investigators, would like to understand what the feature is telling them. That is why encoding methods like feature-hashing or binary encoding (as in category-encoders) were off the table.
Having said this, I get the impression that users are asking for more alternative techniques, so we could consider whether to include these, but I would say at a later stage, after we give that some thought, maybe we do a user survey or something. I would add more on this in the roadmap.
I would keep this issue on hold for now. And focus on other issues that are more of a priority.
And also, I would like to spend some time looking if something similar could be done with random forests or gbm from sklearn instead of lightgbm
@solegalli, let's do a survey in Linkedin asking the interest for these methods.
Is your feature request related to a problem? Please describe.
LightGBM has the option to return the predicted decision tree leaf for every model. From the documentation
Reference
Describe the solution you'd like
A
DecisionTreeLeafEncoder
method that returns the results from thepredict
method of lightgbm as a new feature.