collinearity issue in permutation based feature importance (technical question)

ModelOriented / DALEX

moDel Agnostic Language for Exploration and eXplanation

https://dalex.drwhy.ai

GNU General Public License v3.0

1.38k stars 166 forks source link

collinearity issue in permutation based feature importance (technical question) #533

Closed asheetal closed 1 year ago

asheetal commented 1 year ago

I use DALEX in most of my ML research projects. I keep getting a criticism from reviewers regarding a drawback of permutation based method in the presence of multiple correlated predictors. They argue that if there is a group of highly correlated but important predictors, they may not show up at the top of feature importance. Can someone comment on this criticism? How is this addressed in DALEX?

hbaniecki commented 1 year ago

Hi, what comes to mind is grouping correlated features to attribute their importance as a unity, see the triplot extension to DALEX focusing on correlated features: Triplot: model agnostic measures and visualisations for variable importance in predictive models that take into account the hierarchical correlation structure. Also, importance measures for dependent features, e.g. Model-agnostic Feature Importance and Effects with Dependent Features -- A Conditional Subgroup Approach.

asheetal commented 1 year ago

Thanks for responding back. Yes grouped feature importance is one way to check all correlated features all at once. But I do not know which features are correlated. I am only looking at individual item level predictors. Would that be a problem in DALEX? If so, how can I minimize the effect of correlation between unknown predictors?

hbaniecki commented 1 year ago

But I do not know which features are correlated. I am only looking at individual item level predictors.

I believe this is what the triplot tool aims to overcome. See the exemplary plot visualizing: (left) local feature importance, (right) feature correlation structure, and (middle) combined importance for groups of correlated features.

triplot in Python https://dalex.drwhy.ai/python-dalex-aspect.html
triplot in R https://github.com/ModelOriented/triplot

asheetal commented 1 year ago

So seems like this might be useful for me. So in a nutshell I do need to look at correlated variables before determining which is the most important predictor in the model. Would this be the correct statement?

pbiecek commented 1 year ago

correct, importance of correlated feature is a tricky question, for PDP/ALE profile an interesting discussion about this is also in https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12377 and https://ema.drwhy.ai/accumulatedLocalProfiles.html

for variable importance the triplot helps to compare importance of individual variables as well as groups of correlated variables

asheetal commented 1 year ago

Thanks @pbiecek I tried to follow the ALEplot paper (Apley and Zhu, 2019). I get that PDP plots are not authoritative in the presence of collinearity and we should prefer ALE plots. What is still not clear is how ALE links with model variable importance and the effect collinearity will have on model variable importance. As an applied data scientist, I am just interested in top 5 important features out of approximately 1000 features and use those 5 for a subsequent field experiment. I cannot vary more than 5 in a field experiment.

If Molnar's approach of "conditional subgroup" is the way forward. Is there a code snippet that I can follow? I could use that to potentially generate those top-5 features for the field experiment.

hbaniecki commented 1 year ago

@asheetal

What is still not clear is how ALE links with model variable importance

ALE/PDP link indirectly to variable importance. For example, see our work on Variable importance via oscillations https://github.com/modeloriented/vivo and Variance-based variable importance Greenwell et al. (2018) and Scholbeck et al. (2019) with code at https://github.com/koalaverse/vip.

If Molnar's approach of "conditional subgroup" is the way forward. Is there a code snippet that I can follow?

I can link to the code from the article https://github.com/christophM/paper_conditional_subgroups

I hope all the resources given in this issue can guide you in your experiments.

hbaniecki commented 1 year ago

seems answered, reopen if needed :-)