Closed sealandcy closed 1 year ago
Please add a short example that reproduces the issue.
Thank you for your reply. Hello, now this bug seems to be solved but there is a new error report“Error in check_pred(pred_fun(object, bg_X[, colnames(X), drop = FALSE], : Predictions must be a length n vector or a matrix /data.frame/array with n rows.” I've looked at the detailed code carefully, but I don't feel like I have any problems with it, my rough code is as follows: rf = lrn("classif.ranger", id = "mm", predict_type = "prob", num.trees = 70, mtry = 9,min.node.size=21) rf$train(trainTask3) pred_fun = function(rf, testx1) rf$predict(testTask3)$prob[,2] bg_X <- trainx[sample(nrow(trainx), 200),-276] s <- kernelshap(rf, testx1, bg_X = bg_X, pred_fun =pred_fun,exact = FALSE,hybrid_degree=1)
What happens without passing a pred_fun
?
What is testx1 exactly? It should contain only predictors.
What is testx1 exactly? It should contain only predictors.
Yes, this dataset does not contain the y variable, only the x variable
It will report an error as I listed in the title
Okay. Your model is a classification model and fit$predict_newdata()
does not return numbers but classes. SHAP values are only defined for numeric predictions. Thus, we need to figure out how to make the model predict probabilities...
This works (but don't expect {kernelshap} to be sufficiently fast as predict functions of random forests are very heavy, unfortunately ...):
library(mlr3)
library(mlr3learners)
library(kernelshap)
library(shapviz)
mlr_tasks$get("iris")
tsk("iris")
task_iris <- TaskClassif$new(id = "iris", backend = iris, target = "Species")
fit_rf <- lrn("classif.ranger", predict_type = "prob", num.trees = 10)
fit_rf$train(task_iris)
fit_rf$predict_types
s <- kernelshap(
fit_rf,
X = iris[-5],
bg_X = iris,
pred_fun = function(m, X) m$predict_newdata(X)$prob
)
sv <- shapviz(s)
sv_dependence(sv, "Sepal.Length")
Thank you very much! I'll try it.
I will also try to bring this logic into the package, so that probabilistic classification would work out-of-the box for mlr3 learners. Thanks for reporting!
Hi, there are still questions I want to ask about pred_fun, I would like to ask if kernelshap package can support mlr package? Because I want to use this string of code to calculate the value of shap value.
trainTaskf <- makeClassifTask(data = trainx,target = "cc") testTaskf <- makeClassifTask(data = testx, target = "cc") trainxf<-trainx[, -113] testxf<-testx[, -113] tmpf = mlr::train(m1, trainTaskf) Wresf = predict(tmpf, testTaskf) pred_fun <- function(tmpf,testxf) predict(tmpf, testTaskf)$data[,c(3:4)] shp4 <- kernelshap(tmpf, testxf, bg_X = testxf[700:800,],pred_fun =pred_fun) But it prompts the following error message: Error in check_pred(pred_fun(object, bg_X[, colnames(X), drop = FALSE], : Predictions must be a length n vector or a matrix/data.frame/array with n rows.
The function of pred_fun as I understand it is to run out that the predictions for the model have two columns (the ending is a categorical variable), one for prob.0 and one for prob.1, which is indeed two columns when I try to run predict(tmpf, testTaskf)$data[,c(3:4)], but putting it into the inside the kernelshap model doesn't work.
devtools::install_github("ModelOriented/kernelshap")
?It might be a bit difficult to provide a full version of the current working code as it is more complex and the code is very long. What I might want to understand and solve is how this pred_fun should be set up if the model inside the mlr package is used to calculate the shap value. I'll try to provide a complete code for the model inside the mlr package.
library(mlr) library(kernlab) trainTask <- makeClassifTask(data = trainxf,target = "cc") testTask <- makeClassifTask(data = testxf, target = "cc") ksvm <- makeLearner("classif.ksvm", predict.type = "prob",,par.vals = list(C = 0.004,sigma = 0.004)) pksvm <- mlr::train(ksvm, trainTask) predksvm <- predict(pksvm, testTask) pred_fun <- function(pksvm,testxf) predict(pksvm, testTaskf)$data[,c(3:4)] shp4 <- kernelshap(pksvm, testxf, bg_X = testxf[700:800,],pred_fun =pred_fun)
Your pred_fun
seems wrong. It will need to act on new data, but your dataset is fixed!
Try to go through the official example in the README. No need to specify a pred_fun
at all, at least with the Github version.
library(kernelshap)
library(mlr3)
library(mlr3learners)
# Probabilistic classification -> lrn(..., predict_type = "prob")
task_iris <- TaskClassif$new(id = "class", backend = iris, target = "Species")
fit_rf <- lrn("classif.ranger", predict_type = "prob", num.trees = 50)
fit_rf$train(task_iris)
s <- kernelshap(fit_rf, X = iris[-5], bg_X = iris)
s
It might be because the version of the package I'm using doesn't match this one, so I'm more interested in understanding what pred_fun does, thanks for the answer, I'll look into the documentation a bit more.
This is the automatically derived pred_fun for mlr3 models used in kernelshap.Learner()
:
mlr3_pred_fun <- function(object, X) {
if ("classif" %in% object$task_type) {
# Check if probabilities are available
test_pred <- object$predict_newdata(utils::head(X))
if ("prob" %in% test_pred$predict_types) {
return(function(m, X) m$predict_newdata(X)$prob)
} else {
stop("Set lrn(..., predict_type = 'prob') to allow for probabilistic classification.")
}
}
function(m, X) m$predict_newdata(X)$response
}
Hello, I want to try to run kernelshap with the example related to mlr3, ranger package but it reports an error as in the title, what should I do about this?