Closed ExpandingMan closed 3 months ago
First step is to get list of all sk-learn models that expose this. Then exposing it to MLJ user should not be too bad.
I looked today and I did not notice any comprehensive listing of which models export this. It might only be trees (or ensemble trees), think everything else calls it something slightly different.
Perhaps, for now, you can make use of https://github.com/nredell/ShapML.jl - a model agnostic approach to ranking features. It even has an MLJ example in the docs.
There is also https://github.com/slundberg/ShapleyValues.jl .
I had feature importances for tree methods in my own tree library https://github.com/ablaom/KoalaTrees.jl but this is old and no longer reliable. One day...
Oh man, thanks so much for letting me know about ShapML, I really need feature importance today and it was really being a huge pain. Will definitely check that out.
I'm looking into computing importance in DecisionTree.jl right now. Unfortunately it doesn't seem that there is any completely universal standard for exactly how the importance values should be computed for decision trees (in particular the scikit-learn implementation differs somewhat from the examples in Elements of Statistical Learning, and these seemed to me like the most important references).
Yeah I'm no expert by I only ever assumed the tree importances were a rough guideline to help you identify truly irrelevant features and that they would fail in cases where the features are highly correlated. Everything about it seems highly dependent on details of the model.
No expert either but I seem to remember one of the "standard" rankings for trees corresponds to the Shapley model-agnostic ranking. I just can't remember just now which one!
@OkonSamuel
Added with #61
Many
scikitlearn
models implementfeature_importances_
which gives feature importance rankings. It would be great to add this toreport
somehow.report
is advertised as showing this, but as far as I've been able to find, there aren't too many models that actually implement this (it's not implemented at all in DecisionTree.jl yet).