Error in check_pred(pred_fun(object, X, ...), n = n) : Predictions must be a vector, matrix, data.frame, or <=2D array

sealandcy commented 1 year ago

Hello, I want to try to run kernelshap with the example related to mlr3, ranger package but it reports an error as in the title, what should I do about this?

mayer79 commented 1 year ago

Please add a short example that reproduces the issue.

sealandcy commented 1 year ago

Thank you for your reply. Hello, now this bug seems to be solved but there is a new error report“Error in check_pred(pred_fun(object, bg_X[, colnames(X), drop = FALSE], : Predictions must be a length n vector or a matrix /data.frame/array with n rows.” I've looked at the detailed code carefully, but I don't feel like I have any problems with it, my rough code is as follows: rf = lrn("classif.ranger", id = "mm", predict_type = "prob", num.trees = 70, mtry = 9,min.node.size=21) rf$train(trainTask3) pred_fun = function(rf, testx1) rf$predict(testTask3)$prob[,2] bg_X <- trainx[sample(nrow(trainx), 200),-276] s <- kernelshap(rf, testx1, bg_X = bg_X, pred_fun =pred_fun,exact = FALSE,hybrid_degree=1)

mayer79 commented 1 year ago

What happens without passing a pred_fun?

mayer79 commented 1 year ago

What is testx1 exactly? It should contain only predictors.

sealandcy commented 1 year ago

What is testx1 exactly? It should contain only predictors.

Yes, this dataset does not contain the y variable, only the x variable

sealandcy commented 1 year ago

It will report an error as I listed in the title

mayer79 commented 1 year ago

Okay. Your model is a classification model and fit$predict_newdata() does not return numbers but classes. SHAP values are only defined for numeric predictions. Thus, we need to figure out how to make the model predict probabilities...

mayer79 commented 1 year ago

This works (but don't expect {kernelshap} to be sufficiently fast as predict functions of random forests are very heavy, unfortunately ...):

library(mlr3)
library(mlr3learners)
library(kernelshap)
library(shapviz)

mlr_tasks$get("iris")
tsk("iris")
task_iris <- TaskClassif$new(id = "iris", backend = iris, target = "Species")
fit_rf <- lrn("classif.ranger", predict_type = "prob", num.trees = 10)
fit_rf$train(task_iris)
fit_rf$predict_types
s <- kernelshap(
  fit_rf, 
  X = iris[-5], 
  bg_X = iris, 
  pred_fun = function(m, X) m$predict_newdata(X)$prob
)
sv <- shapviz(s)
sv_dependence(sv, "Sepal.Length")

sealandcy commented 1 year ago

Thank you very much! I'll try it.

mayer79 commented 1 year ago

I will also try to bring this logic into the package, so that probabilistic classification would work out-of-the box for mlr3 learners. Thanks for reporting!

mayer79 commented 1 year ago

Fixed in https://github.com/ModelOriented/kernelshap/pull/100

sealandcy commented 1 year ago

Hi, there are still questions I want to ask about pred_fun, I would like to ask if kernelshap package can support mlr package? Because I want to use this string of code to calculate the value of shap value.

trainTaskf <- makeClassifTask(data = trainx,target = "cc") testTaskf <- makeClassifTask(data = testx, target = "cc") trainxf<-trainx[, -113] testxf<-testx[, -113] tmpf = mlr::train(m1, trainTaskf) Wresf = predict(tmpf, testTaskf) pred_fun <- function(tmpf,testxf) predict(tmpf, testTaskf)$data[,c(3:4)] shp4 <- kernelshap(tmpf, testxf, bg_X = testxf[700:800,],pred_fun =pred_fun) But it prompts the following error message: Error in check_pred(pred_fun(object, bg_X[, colnames(X), drop = FALSE], : Predictions must be a length n vector or a matrix/data.frame/array with n rows.

The function of pred_fun as I understand it is to run out that the predictions for the model have two columns (the ending is a categorical variable), one for prob.0 and one for prob.1, which is indeed two columns when I try to run predict(tmpf, testTaskf)$data[,c(3:4)], but putting it into the inside the kernelshap model doesn't work.

mayer79 commented 1 year ago

Did you test with the newest Github version, i.e., via devtools::install_github("ModelOriented/kernelshap")?
Can you please provide a working code (except the part that is not working...)? The current code is incomplete.

sealandcy commented 1 year ago

It might be a bit difficult to provide a full version of the current working code as it is more complex and the code is very long. What I might want to understand and solve is how this pred_fun should be set up if the model inside the mlr package is used to calculate the shap value. I'll try to provide a complete code for the model inside the mlr package.

library(mlr) library(kernlab) trainTask <- makeClassifTask(data = trainxf,target = "cc") testTask <- makeClassifTask(data = testxf, target = "cc") ksvm <- makeLearner("classif.ksvm", predict.type = "prob",,par.vals = list(C = 0.004,sigma = 0.004)) pksvm <- mlr::train(ksvm, trainTask) predksvm <- predict(pksvm, testTask) pred_fun <- function(pksvm,testxf) predict(pksvm, testTaskf)$data[,c(3:4)] shp4 <- kernelshap(pksvm, testxf, bg_X = testxf[700:800,],pred_fun =pred_fun)

mayer79 commented 1 year ago

Your pred_fun seems wrong. It will need to act on new data, but your dataset is fixed!

Try to go through the official example in the README. No need to specify a pred_fun at all, at least with the Github version.

library(kernelshap)
library(mlr3)
library(mlr3learners)

# Probabilistic classification -> lrn(..., predict_type = "prob")
task_iris <- TaskClassif$new(id = "class", backend = iris, target = "Species")
fit_rf <- lrn("classif.ranger", predict_type = "prob", num.trees = 50)
fit_rf$train(task_iris)
s <- kernelshap(fit_rf, X = iris[-5], bg_X = iris)
s

sealandcy commented 1 year ago

It might be because the version of the package I'm using doesn't match this one, so I'm more interested in understanding what pred_fun does, thanks for the answer, I'll look into the documentation a bit more.

mayer79 commented 1 year ago

This is the automatically derived pred_fun for mlr3 models used in kernelshap.Learner():

mlr3_pred_fun <- function(object, X) {
  if ("classif" %in% object$task_type) {
    # Check if probabilities are available
    test_pred <- object$predict_newdata(utils::head(X))
    if ("prob" %in% test_pred$predict_types) {
      return(function(m, X) m$predict_newdata(X)$prob)
    } else {
      stop("Set lrn(..., predict_type = 'prob') to allow for probabilistic classification.")
    }
  }
  function(m, X) m$predict_newdata(X)$response
}

ModelOriented / kernelshap

Error in check_pred(pred_fun(object, X, ...), n = n) : Predictions must be a vector, matrix, data.frame, or <=2D array #99