Closed esther-meerwijk closed 2 years ago
Hi @esther-meerwijk, I just ran your code and I get the same results…very strange. I’m on vacation but will try to figure out what’s going on later this week.
Hi @esther-meerwijk, couple of small tweaks to fix your script:
newdata
in our definition of pfun()
;X
, in this case, needs to be a data frame (because GLMs can only predict on data frames);type = 'link'
instead. Code and output below:
x1 <- c(1,1,1,0,0,0,0,0,0,0)
x2 <- c(1,0,0,1,1,1,0,0,0,0)
x3 <- c(3,2,1,3,2,1,3,2,1,3)
x4 <- c(1,0,1,1,0,1,0,1,0,1)
y <- c(1,0,1,0,1,1,0,0,0,1)
df <- data.frame(x1, x2, x3, x4, y)
X <- subset(df, select = -y) # features only
fit <- glm(y ~ ., data=df, family=binomial)
pfun <- function(object, newdata) {
predict(object, type = "link", newdata = newdata)
}
set.seed(845) # for reproduicibility
head(shap1 <- explain(fit , X = X, pred_wrapper = pfun, nsim = 1000))
# # A tibble: 6 × 4
# x1 x2 x3 x4
# <dbl> <dbl> <dbl> <dbl>
# 1 0.853 1.22 -0.639 0.723
# 2 0.848 -0.807 0.0390 -1.03
# 3 0.868 -0.854 0.748 0.696
# 4 -0.379 1.24 -0.601 0.682
# 5 -0.392 1.17 0.0620 -1.01
# 6 -0.381 1.24 0.777 0.693
head(shap2 <- explain(fit , X = X, exact = TRUE))
# A tibble: 6 × 4
# x1 x2 x3 x4
# <dbl> <dbl> <dbl> <dbl>
# 1 0.854 1.22 -0.627 0.700
# 2 0.854 -0.815 0.0697 -1.05
# 3 0.854 -0.815 0.766 0.700
# 4 -0.366 1.22 -0.627 0.700
# 5 -0.366 1.22 0.0697 -1.05
# 6 -0.366 1.22 0.766 0.700
Yep, that does it 👍 Thanks so much for figuring that out!
I've been perusing various sites that describe how to determine approximate values with fastshap for a binomial glm model, but so far have been unsuccessful in making it work. Here's what I have been using:
Here's the result:
Obviously not what I expect. With the exact method, I do get values that make sense:
but I cannot use the exact method on my actual data because the model features are not independent. Any help getting this to work would be appreciated!