ModelOriented / kernelshap

Different SHAP algorithms
https://modeloriented.github.io/kernelshap/
GNU General Public License v2.0
36 stars 7 forks source link

kernelshap calculation makes R abort #119

Closed quantumlinguist closed 9 months ago

quantumlinguist commented 10 months ago

I'm trying to calculate SHAP-values of random forests from the ranger package. The random forests have between 1000 and 8000 trees and the datasets between 5000 and 37000 rows. The only model I managed to calculate SHAP-values for is a model that had 850 trees. The rest all crashed. I find it odd that none of the tutorials or vignettes says this could happen so I am wondering if there is something wrong with my models or whether this is expected. I cannot post the data for privacy issues (government data) but the variables are all numerical except one binary variable. One group of datasets have 8 variables (there are datasets for different years) and the other 32. The dependent variable is a 3-level factor. I'm using a 32GB Macbook pro and R 4.1.2.

This is the code I am using:

ntree=850

set.seed(199) model <- ranger(Satisfaction ~ ., data = df20, importance = 'permutation', num.trees = ntree20, mtry=3, probability = TRUE, case.weights = class_weights[df20$Satisfaction])

set.seed(199) xvars=colnames(df20[-9]) X= df20[sample(nrow(df20),500), xvars]

bg_X= df20[sample(nrow(df20),200),]

shap_rf= kernelshap(model, X, bg_X = bg_X)

mayer79 commented 10 months ago

When the number of trees plays a role, then maybe your system goes out of memory during the predict() call of ranger.

Random forest predictions are extremely expensive, which is a problem for expensive model-agnostic XAI methods. When accuracy matters, I usually prefer well-tuned boosted trees models over random forests.

When it is no option to increase RAM or use the smaller (still very large) random forest, you can test with {treeshap} instead of {kernelshap}. Would be interesting to see if the problem persists with TreeSHAP.

quantumlinguist commented 10 months ago

The problem with TreeSHAP is that my dependent variable is a factor and it doesnt seem to support this as I get an error. The error goes away if I convert the response variable to numeric but I need it to be a factor for interpretation purposes.

unified= unify(model, df20) Warning message: In Ops.factor(get("Prediction"), n) : ‘/’ not meaningful for factors

mayer79 commented 10 months ago

Ah. Maybe you can fit the larger forest and only use the first 500 trees for interpretability. I fear that I can't help l, otherwise.

quantumlinguist commented 10 months ago

do you know how to do that with a ranger random forest?

Gustavo Guajardo, PhD LinkedIn Profile https://www.linkedin.com/in/guajardogustavo/ Research Gate Profile https://www.researchgate.net/profile/Gustavo-Guajardo

On Tue, Nov 21, 2023 at 2:23 PM Michael Mayer @.***> wrote:

Ah. Maybe you can fit the larger forest and only use the first 500 trees for interpretability. I fear that I can't help l, otherwise.

— Reply to this email directly, view it on GitHub https://github.com/ModelOriented/kernelshap/issues/119#issuecomment-1820919587, or unsubscribe https://github.com/notifications/unsubscribe-auth/APV733D4AP6WZVKVOTQDGRDYFSTNFAVCNFSM6AAAAAA7SUII72VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRQHEYTSNJYG4 . You are receiving this because you authored the thread.Message ID: @.***>

quantumlinguist commented 10 months ago

I figured it out. They have a function deforest() where you can remove trees from the random forest.

Gustavo Guajardo, PhD LinkedIn Profile https://www.linkedin.com/in/guajardogustavo/ Research Gate Profile https://www.researchgate.net/profile/Gustavo-Guajardo

On Tue, Nov 21, 2023 at 2:23 PM Michael Mayer @.***> wrote:

Ah. Maybe you can fit the larger forest and only use the first 500 trees for interpretability. I fear that I can't help l, otherwise.

— Reply to this email directly, view it on GitHub https://github.com/ModelOriented/kernelshap/issues/119#issuecomment-1820919587, or unsubscribe https://github.com/notifications/unsubscribe-auth/APV733D4AP6WZVKVOTQDGRDYFSTNFAVCNFSM6AAAAAA7SUII72VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMRQHEYTSNJYG4 . You are receiving this because you authored the thread.Message ID: @.***>

mayer79 commented 10 months ago

Or pass num.trees to predict().