bgreenwell / fastshap

Fast approximate Shapley values in R
https://bgreenwell.github.io/fastshap/
112 stars 18 forks source link

no applicable method for 'predict' applied to an object of class "ranger" #56

Open quantumlinguist opened 1 year ago

quantumlinguist commented 1 year ago

I have been trying to use the fastshap package but I get this error:

task 1 failed - "no applicable method for 'predict' applied to an object of class "ranger""

If I do methods(predict), predict.ranger does appear in the list.

bgreenwell commented 1 year ago

Hi @quantumlinguist, if this is still troubling you, would you mind sharing a reproducible example for me to run on my end?

viola-hilbert commented 1 year ago

Hi @bgreenwell, I get the same error when running exactly the code from your vignette (fastshap.Rmd).

library(fastshap)
library(ranger)
library(AmesHousing)
library(doParallel)

ames <- as.data.frame(AmesHousing::make_ames())
X <- subset(ames, select = -Sale_Price)  # features only

# Fit a random forest
set.seed(102)
(rfo <- ranger(Sale_Price ~ ., data =  ames, write.forest = TRUE))

# Prediction wrapper
pfun <- function(object, newdata) {
  predict(object, data = newdata)$predictions
}

# With parallelism ---> error "task 1 failed - "no applicable method for 'predict' applied to an object of class "ranger""
registerDoParallel(cores = 12)  # use forking with 12 cores
set.seed(5038)
system.time({  # estimate run time
  ex.ames.par <- explain(rfo, X = X, pred_wrapper = pfun, nsim = 50, 
                         adjust = TRUE, parallel = TRUE)
})

When I run

# Without parallelism --> works
set.seed(1706)
system.time({  # estimate run time
  ex.ames.nonpar <- explain(rfo, X = X, pred_wrapper = pfun, nsim = 50,
                            adjust = TRUE)
})

it works, so I guess the issue must be related to the option parallel = TRUE?

I also used parallel::detectCores() to figure out that I only have 6 cores to use, but changing the above to registerDoParallel(cores = 6) did not solve the problem.

brandongreenwell-8451 commented 12 months ago

Hi @viola-hilbert are you on getting this error a Windows machine?

viola-hilbert commented 12 months ago

yes!

brandongreenwell-8451 commented 12 months ago

Instead of registerDoParallel(cores = 12) can you try the following:

cl <- makeCluster(4)
registerDoParallel(cl)

But do try this out on a much smaller sample for testing! For example, pass in newdata = X[1, ] to explain a single instance.

hlboy2333 commented 10 months ago

Hi @brandongreenwell-8451 , I get the same error as @viola-hilbert .I also use the code from your vignette (fastshap.Rmd), and I attempted the method you mentioned(cl <- makeCluster(4)).But the error(no applicable method for 'predict' applied to an object of class...)still exist when I choose the option that parallel = TRUE. After many attempts, I found that sometimes the parallel operation worked, but most of the time it didn't work and the same error occurred. Is there any solution at present? Thank you very much and look forward to your reply.

brandongreenwell-8451 commented 10 months ago

Hi @hlboy2333, what type of OS are you running this on?

hlboy2333 commented 10 months ago

Hi @hlboy2333, what type of OS are you running this on? It is Windows 11, and I have 16 cores to use.

brandongreenwell-8451 commented 10 months ago

Thanks @hlboy2333, I'm having trouble reproducing the issue on my end. You may just need to pass "ranger" via the .packages argument as shown below. Can you try running this and see what you get?

library(fastshap)
library(ranger)
library(AmesHousing)
library(doParallel)

ames <- as.data.frame(AmesHousing::make_ames())[1:200, ]  # try with a sample
X <- subset(ames, select = -Sale_Price)  # features only

# Fit a random forest
set.seed(102)
(rfo <- ranger(Sale_Price ~ ., data =  ames, write.forest = TRUE))

# Prediction wrapper
pfun <- function(object, newdata) {
  predict(object, data = newdata)$predictions
}

cl <- makeCluster(4) # use 4 workers
registerDoParallel(cl) # register the parallel backend

system.time({  # estimate run time
  ex.ames.par <- explain(rfo, X = X, pred_wrapper = pfun, nsim = 5, 
                         adjust = TRUE, parallel = TRUE, .packages = "ranger")
})
hlboy2333 commented 10 months ago

Thank you so much for your help @brandongreenwell-8451. Your method worked successfully(Not only on the code you provided, but also on my own data and models (It took a while, please forgive me for being late). It solved the problem that had been bothering me for nearly a day. I am curious about why you use ".packages=", what does this parameter do?

brandongreenwell-8451 commented 10 months ago

Hi @hlboy2333, glad it works now. This is more of a function of the foreach package, which is used under the hood. You can read some about it in the associated vignette: https://cran.r-project.org/web/packages/foreach/vignettes/foreach.html.

You can pass additional arguments to foreach via the ... param in the call to explain(), as described in the help page. I think passing packages is more of an issue with which type of parallel processing you're using in R (e.g., snow-like or multicore-like). For the former, which I think is called forking and typically what's used on Windows, you often have to pass in packages, etc. if the function you're running requires it.

brandongreenwell-8451 commented 10 months ago

I'll leave this issue open until I can generalize the vignette example to be more system agnostic.