Closed yokoshin closed 2 years ago
Ideally this library should implement all of these: https://towardsdatascience.com/the-mathematics-of-decision-trees-random-forest-and-feature-importance-in-scikit-learn-and-spark-f2861df67e3
They're also not hard to do, you just need the time, which I don't really have at the moment. Feel free to submit a PR, there's a decent example here: https://github.com/Rambatino/CHAID/blob/master/CHAID/tree.py#L284
I don't think It's difficult to impl feature importance. My understanding is that feature importance is calculated based on some index like gini or entropy. Do you know which index CHAID in SPSS use?
I don't unfortunately, haven't looked at this stuff in a while since the repo hit maturity. There's a PDF somewhere that breaks down the calculations, I couldn't find it just now, but it shouldn't be too difficult to track down
chefboost uses chi-square values. https://github.com/serengil/chefboost/blob/master/chefboost/training/Training.py#L164
yeah seems simple enough. Feel free to submit a PR
Hi, I have a question. How can I get the importance of each independent variable? I mean "feature_importance" in other ML libraries.