Improved feature importance support

ablaom commented 3 years ago

The MLJ model API only says that model reporting feature importances should report them in the report output by fit. But it says nothing about the actual format of this output, and I can see inconsistencies in the implementations. Feature importances are used by some meta-alogorithms, such as RecursiveFeatureElimination (#426) so this might be worth sorting out.

I propose adding a new method feature_importance(model::Model, report) to the model API to report the scores, according to some fixed convention. ~~Some models (e.g., LightGBM models) report multiple types of importance scores. So I propose this method return a named tuple keyed on the type, whose values are Float64 vectors.~~

edit See suggestion for format below.

edit The proposal follows that same interface patter that we have already for training_losses.

Thoughts anyone?

TODO:

[x] Add reports_feature_importances trait to StatisticalTraits, defaulting to false
[x] Add feature_importances(model, report) stub to MLJModelInterface (in model_api.jl); fallback to return nothing.
[x] In MLJBase: Overload MMI.feature_importance(mach::Machine) following this pattern
[x] Update MLJ model API docs
[x] In MLJ: https://github.com/alan-turing-institute/MLJ.jl/issues/954
[ ] Roll out implementations for packages that already report importances in their report (including linear models for which absolute value of coefficients serve). These include:
- [x] MLJDecisionTreeInterface
- [x] EvoTrees models
- [ ] MLJLinearModels
- [ ] MLJGLMInterface
- [x] MLJXGBoostInterface
- [ ] LightGBM It may make sense to roll out data front-ends for some of these models at the same time, mimicking the EvoTrees case where this already done.
[ ] Get a list of scikit learn models that expose importances or coefficients and get these models to report the scores in their report, and to implement the above method and trait. See https://github.com/JuliaAI/MLJScikitLearnInterface.jl/issues/30 and https://github.com/JuliaAI/MLJScikitLearnInterface.jl/issues/26

ablaom commented 3 years ago

cc @boliu-christine

ablaom commented 2 years ago

Here's an update on my suggestion for the format of feature importances, as returned by the proposed method feature_importances(model, report).

I think allowing models to expose multiple types of feature importance is overkill / excessively complicated. Of course multiple scores can still be declared in the report itself.

So I suggest a vector of name => float pairs, where name is a symbol:

v= [:gender =>0.23, :height =>, :weight => 0.1]

zsz00 commented 2 years ago

What is the current state of this ?? I need feature importance support !

OkonSamuel commented 2 years ago

What is the current state of this ?? I need feature importance support !

Am still working on this. Will be done soon.

zsz00 commented 2 years ago

What is the current state of this ? @OkonSamuel

JuliaAI / MLJ.jl

Improved feature importance support #747