Closed asheetal closed 1 year ago
Hi, what comes to mind is grouping correlated features to attribute their importance as a unity, see the triplot
extension to DALEX
focusing on correlated features: Triplot: model agnostic measures and visualisations for variable importance in predictive models that take into account the hierarchical correlation structure. Also, importance measures for dependent features, e.g. Model-agnostic Feature Importance and Effects with Dependent Features -- A Conditional Subgroup Approach.
Thanks for responding back. Yes grouped feature importance is one way to check all correlated features all at once. But I do not know which features are correlated. I am only looking at individual item level predictors. Would that be a problem in DALEX? If so, how can I minimize the effect of correlation between unknown predictors?
But I do not know which features are correlated. I am only looking at individual item level predictors.
I believe this is what the triplot
tool aims to overcome. See the exemplary plot visualizing: (left) local feature importance, (right) feature correlation structure, and (middle) combined importance for groups of correlated features.
triplot
in Python https://dalex.drwhy.ai/python-dalex-aspect.htmltriplot
in R https://github.com/ModelOriented/triplot So seems like this might be useful for me. So in a nutshell I do need to look at correlated variables before determining which is the most important predictor in the model. Would this be the correct statement?
correct, importance of correlated feature is a tricky question, for PDP/ALE profile an interesting discussion about this is also in https://rss.onlinelibrary.wiley.com/doi/10.1111/rssb.12377 and https://ema.drwhy.ai/accumulatedLocalProfiles.html
for variable importance the triplot helps to compare importance of individual variables as well as groups of correlated variables
Thanks @pbiecek I tried to follow the ALEplot paper (Apley and Zhu, 2019). I get that PDP plots are not authoritative in the presence of collinearity and we should prefer ALE plots. What is still not clear is how ALE links with model variable importance and the effect collinearity will have on model variable importance. As an applied data scientist, I am just interested in top 5 important features out of approximately 1000 features and use those 5 for a subsequent field experiment. I cannot vary more than 5 in a field experiment.
If Molnar's approach of "conditional subgroup" is the way forward. Is there a code snippet that I can follow? I could use that to potentially generate those top-5 features for the field experiment.
@asheetal
What is still not clear is how ALE links with model variable importance
ALE/PDP link indirectly to variable importance. For example, see our work on Variable importance via oscillations https://github.com/modeloriented/vivo and Variance-based variable importance Greenwell et al. (2018) and Scholbeck et al. (2019) with code at https://github.com/koalaverse/vip.
If Molnar's approach of "conditional subgroup" is the way forward. Is there a code snippet that I can follow?
I can link to the code from the article https://github.com/christophM/paper_conditional_subgroups
I hope all the resources given in this issue can guide you in your experiments.
seems answered, reopen if needed :-)
I use DALEX in most of my ML research projects. I keep getting a criticism from reviewers regarding a drawback of permutation based method in the presence of multiple correlated predictors. They argue that if there is a group of highly correlated but important predictors, they may not show up at the top of feature importance. Can someone comment on this criticism? How is this addressed in DALEX?