JuliaAI / MLJScikitLearnInterface.jl

MLJ Interface for ScikitLearn.jl
Other
12 stars 6 forks source link

implement feature importance rankings #26

Closed ExpandingMan closed 3 months ago

ExpandingMan commented 3 years ago

Many scikitlearn models implement feature_importances_ which gives feature importance rankings. It would be great to add this to report somehow. report is advertised as showing this, but as far as I've been able to find, there aren't too many models that actually implement this (it's not implemented at all in DecisionTree.jl yet).

ablaom commented 3 years ago

First step is to get list of all sk-learn models that expose this. Then exposing it to MLJ user should not be too bad.

ExpandingMan commented 3 years ago

I looked today and I did not notice any comprehensive listing of which models export this. It might only be trees (or ensemble trees), think everything else calls it something slightly different.

ablaom commented 3 years ago

Perhaps, for now, you can make use of https://github.com/nredell/ShapML.jl - a model agnostic approach to ranking features. It even has an MLJ example in the docs.

There is also https://github.com/slundberg/ShapleyValues.jl .

I had feature importances for tree methods in my own tree library https://github.com/ablaom/KoalaTrees.jl but this is old and no longer reliable. One day...

ExpandingMan commented 3 years ago

Oh man, thanks so much for letting me know about ShapML, I really need feature importance today and it was really being a huge pain. Will definitely check that out.

I'm looking into computing importance in DecisionTree.jl right now. Unfortunately it doesn't seem that there is any completely universal standard for exactly how the importance values should be computed for decision trees (in particular the scikit-learn implementation differs somewhat from the examples in Elements of Statistical Learning, and these seemed to me like the most important references).

ablaom commented 3 years ago

https://explained.ai/rf-importance/index.html

ExpandingMan commented 3 years ago

Yeah I'm no expert by I only ever assumed the tree importances were a rough guideline to help you identify truly irrelevant features and that they would fail in cases where the features are highly correlated. Everything about it seems highly dependent on details of the model.

ablaom commented 3 years ago

No expert either but I seem to remember one of the "standard" rankings for trees corresponds to the Shapley model-agnostic ranking. I just can't remember just now which one!

ablaom commented 2 years ago

@OkonSamuel

tylerjthomas9 commented 3 months ago

Added with #61