ModelOriented / survex

Explainable Machine Learning in Survival Analysis
https://modeloriented.github.io/survex
GNU General Public License v3.0
94 stars 10 forks source link

Calculation of SHAP values for continuous and categorical variables #93

Open lzxcvn opened 3 months ago

lzxcvn commented 3 months ago

Hello, While using model_survshap() for global interpretation with shaps, I found that this function seems to be unable to distinguish between continuous variables and categorical variables, which has caused my concern. Even though I added a list of categorical variables, is the calculation of shap values the same for different variables? This seems to change the order of importance of the variables. image My code can run, but the results seem unreliable. Previously unimportant variables such as categorical variables like sex seem to have become more important

hbaniecki commented 2 weeks ago

Hi @lzxcvn, where did you find the categorical_variables parameter?

In most cases, SHAP does not distinguish between continuous and categorical variables. It might be important when conditional imputation is used for feature marginalization (instead of the default marginal feature distribution). For details, refer to the shapr R package https://github.com/NorskRegnesentral/shapr, and the related research e.g. https://doi.org/10.1007/s10618-024-01016-z.

Moreover, KernelSHAP is an approximation algorithm that includes randomness, which can lead to changes in the order of importance of the variables.