Closed tararae7 closed 3 years ago
No, there isn't yet. If you use the Python version however, there is some undocumented functionality (_get_model_obj
) that converts the C++ objects into Python equivalents from which you can check which features are used, but you'd have to get familiar with the C++ object structures to understand how to use them.
Ok. Does the Python version indicate feature importance? Last question.
No, and there won't be any such functionality either. Unless using averaged or pooled gain criteria, the features are selected at random, so it doesn't make sense to calculate feature importances. If you want to get an idea of the impact of a given feature in the final predictions, you can try something like shapley values, for which you can find many packages, but it might not be very reliable due to all the randomness involved. You can also use kurtosis if you're looking at chances of finding outliers in a given column.
Closing as the latest version (coming to CRAN soon) now has this in the form of an SQL generator.
Hi David,
Is there a way to tell which features are being chosen for each node?
Thanks, Tara