david-cortes / isotree

(Python, R, C/C++) Isolation Forest and variations such as SCiForest and EIF, with some additions (outlier detection + similarity + NA imputation)
https://isotree.readthedocs.io
BSD 2-Clause "Simplified" License
186 stars 38 forks source link

Model features #16

Closed tararae7 closed 3 years ago

tararae7 commented 3 years ago

Hi David,

Is there a way to tell which features are being chosen for each node?

Thanks, Tara

david-cortes commented 3 years ago

No, there isn't yet. If you use the Python version however, there is some undocumented functionality (_get_model_obj) that converts the C++ objects into Python equivalents from which you can check which features are used, but you'd have to get familiar with the C++ object structures to understand how to use them.

tararae7 commented 3 years ago

Ok. Does the Python version indicate feature importance? Last question.

david-cortes commented 3 years ago

No, and there won't be any such functionality either. Unless using averaged or pooled gain criteria, the features are selected at random, so it doesn't make sense to calculate feature importances. If you want to get an idea of the impact of a given feature in the final predictions, you can try something like shapley values, for which you can find many packages, but it might not be very reliable due to all the randomness involved. You can also use kurtosis if you're looking at chances of finding outliers in a given column.

david-cortes commented 3 years ago

Closing as the latest version (coming to CRAN soon) now has this in the form of an SQL generator.