bdwilliamson / vimpy

Perform inference on algorithm-agnostic variable importance in Python
https://pypi.org/project/vimpy/
MIT License
20 stars 5 forks source link

extend code to feature groups #5

Open shaayaansayed opened 2 years ago

shaayaansayed commented 2 years ago

correct me if I'm wrong, but I don't believe the current code is setup to calculate values for feature groups.

Can you confirm I'm understanding this correctly? To extend the code for groups, we would want to select subsets over feature groups rather than individual features. Then when measuring predictiveness, we include all features that are part of the selected feature groups. So for example, if we have groups:

vitals = [blood_pressure, heart_rate] labs = [sodium, potassium, sugar] diagnoses = [kidney, heart, liver]

If S = [0, 1], then we train a model with blood_pressure, heart rate, sodium, potassium, and sugar.

Would we need to normalize anything?

bdwilliamson commented 2 years ago

Which function are you using?

For both vim and cv_vim, you should be able to input a vector of indices to the argument s. For your example, if your predictors are [blood pressure, heart rate, sodium, potassium, sugar], you could input s = [0,1] to consider the importance of vitals as a group.

Groups aren't currently set up in spvim. To extend to groups, we would (a) create a partition of the space into the groups (in your example, vitals, labs, and diagnoses), (b) measure predictiveness using each combination of the feature groups [in your example: all variables, no variables, vitals alone, labs alone, diagnoses alone, vitals + labs, vitals + diagnoses, labs + diagnoses], (c) combine together using the formula. The normalization constant would be different than the individual-variable Shapley value.

I don't have time for this at the moment (and I think @jjfeng probably doesn't either -- though she may have thought about it a bit), so if you want to create a PR that would be fantastic!