Rambatino / CHAID

A python implementation of the common CHAID algorithm
Apache License 2.0
149 stars 50 forks source link

how to get feature_importance #122

Closed yokoshin closed 2 years ago

yokoshin commented 3 years ago

Hi, I have a question. How can I get the importance of each independent variable? I mean "feature_importance" in other ML libraries.

Rambatino commented 3 years ago

Ideally this library should implement all of these: https://towardsdatascience.com/the-mathematics-of-decision-trees-random-forest-and-feature-importance-in-scikit-learn-and-spark-f2861df67e3

They're also not hard to do, you just need the time, which I don't really have at the moment. Feel free to submit a PR, there's a decent example here: https://github.com/Rambatino/CHAID/blob/master/CHAID/tree.py#L284

yokoshin commented 3 years ago

I don't think It's difficult to impl feature importance. My understanding is that feature importance is calculated based on some index like gini or entropy. Do you know which index CHAID in SPSS use?

Rambatino commented 3 years ago

I don't unfortunately, haven't looked at this stuff in a while since the repo hit maturity. There's a PDF somewhere that breaks down the calculations, I couldn't find it just now, but it shouldn't be too difficult to track down

yokoshin commented 3 years ago

chefboost uses chi-square values. https://github.com/serengil/chefboost/blob/master/chefboost/training/Training.py#L164

Rambatino commented 3 years ago

yeah seems simple enough. Feel free to submit a PR