bgreenwell / fastshap

Fast approximate Shapley values in R
https://bgreenwell.github.io/fastshap/
113 stars 18 forks source link

Error when using 'autoplot' to get the contributions of a specific row #10

Closed KhalilBelghouat closed 3 years ago

KhalilBelghouat commented 4 years ago

Dear Brandon Greenwell,

I'm using the great 'fastshap' library to compute the approximate Shapley values, when running the first code example provided in the official CRAN document [1], the last line gives an error that states:

ERROR while rich displaying an object: Error in farver::decode_colour(colors, alpha = TRUE, to = "lab", na_value = "transparent"): unused argument (na_value = "transparent")

The code is:

#
# A projection pursuit regression (PPR) example
#

# Load the sample data; see ?datasets::mtcars for details
data(mtcars)

# Fit a projection pursuit regression model
mtcars.ppr <- ppr(mpg ~ ., data = mtcars, nterms = 1)

# Compute approximate Shapley values using 10 Monte Carlo simulations
set.seed(101)  # for reproducibility
shap <- explain(mtcars.ppr, X = subset(mtcars, select = -mpg), nsim = 10, pred_wrapper = predict)
shap

# Shapley-based plots
library(ggplot2)
autoplot(shap)  # Shapley-based importance plot
autoplot(shap, type = "dependence", feature = "wt", X = mtcars)
autoplot(shap, type = "contribution", row_num = 1)  # explain first row of X

[1] https://cran.r-project.org/web/packages/fastshap/fastshap.pdf

bgreenwell commented 4 years ago

Hi @KhalilBelghouat, glad to hear you find the package useful. Unfortunately, I'm unable to reproduce the error on my end. Could you also include your session info (e.g., sessionInfo())? I'm currently out on leave but will try to troubleshoot when I find time. (As a work around, you could always construct the plot manually from the output itself.)

KhalilBelghouat commented 4 years ago

Hi @bgreenwell, I hope you are doing well. I'm using R on a Google Colaboratory notebook, I guess this is perhaps the reason why the plot can't be displayed, but I'm not sure. I will go with the work around. Thanks for your help.

Here is the session info:

R version 3.6.3 (2020-02-29)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.3 LTS

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/libopenblasp-r0.2.20.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] caret_6.0-86         lattice_0.20-41      randomForest_4.6-14 
 [4] glmnet_4.0-2         Matrix_1.2-18        grplasso_0.4-7      
 [7] h2o_3.30.0.1         bit64_4.0.2          bit_4.0.4           
[10] vip_0.2.2            kernlab_0.9-29       rpart.plot_3.0.8    
[13] rpart_4.1-15         fastshap_0.0.5       xgboost_1.1.1.1     
[16] NeuralNetTools_1.5.2 nnet_7.3-14          pdp_0.7.0           
[19] e1071_1.7-3          ggplot2_3.3.2       

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.5           lubridate_1.7.9      tidyr_1.1.0         
 [4] class_7.3-17         ipred_0.9-9          digest_0.6.25       
 [7] foreach_1.5.0        IRdisplay_0.7.0      R6_2.4.1            
[10] plyr_1.8.6           repr_1.1.0           stats4_3.6.3        
[13] evaluate_0.14        pillar_1.4.6         rlang_0.4.7         
[16] uuid_0.1-4           data.table_1.13.0    splines_3.6.3       
[19] gower_0.2.2          stringr_1.4.0        RCurl_1.98-1.2      
[22] munsell_0.5.0        compiler_3.6.3       pkgconfig_2.0.3     
[25] base64enc_0.1-3      shape_1.4.4          htmltools_0.5.0     
[28] tidyselect_1.1.0     prodlim_2019.11.13   tibble_3.0.3        
[31] gridExtra_2.3        codetools_0.2-16     crayon_1.3.4        
[34] dplyr_1.0.0          withr_2.2.0          ModelMetrics_1.2.2.2
[37] MASS_7.3-51.6        bitops_1.0-6         recipes_0.1.13      
[40] grid_3.6.3           nlme_3.1-147         jsonlite_1.7.0      
[43] gtable_0.3.0         lifecycle_0.2.0      magrittr_1.5        
[46] pROC_1.16.2          scales_1.1.1         stringi_1.4.6       
[49] reshape2_1.4.4       timeDate_3043.102    ellipsis_0.3.1      
[52] generics_0.0.2       vctrs_0.3.2          IRkernel_1.1.1      
[55] lava_1.6.7           iterators_1.0.12     tools_3.6.3         
[58] glue_1.4.1           purrr_0.3.4          survival_3.2-3      
[61] colorspace_1.4-1     pbdZMQ_0.3-3        
bgreenwell commented 4 years ago

That might be the problem! Try upgrading the r kernel repl and see if that works:

devtools::install_github(“IRkernel/repr”)

KhalilBelghouat commented 4 years ago

Hi @bgreenwell, thanks for the guidance. Yesterday I used fastshap to explain a caret model and worked fine, today I applied it again to the same model and got the error:

Error in UseMethod("explain"): no applicable method for 'explain' applied to an object of class "c('train', 'train.formula')"

What could be the reason for the error?

Edit 1:

I tried it on the lm model provided as an example in the official documentation and isn't working either after working just fine yesterday.

Edit 2:

It actually worked after checking a post by someone who had the same issue: https://github.com/thomasp85/lime/issues/48

bgreenwell commented 4 years ago

Most likely culprit is you loaded a package after fastshap that exports a function of the same name, in this case, explain(). Dplyr is the most common culprit for me. To see if that’s the issue, use fastshap::explain() instead of just explain(). If it works, then type ?explain in the console...if you get multiple options for a help page then that could tell you which one is the culprit.

KhalilBelghouat commented 4 years ago

You are absolutely right! I apologize for the bothering.

bgreenwell commented 4 years ago

No apologies necessary! I wrote the package and still fall for the same error occasionally. Let this help you to always remember to try and manage your namespace and only import packages who’s functions you’ll rely on a bunch. I prefer to use pkgname::functionname() if I’m only using one or two functions and only a handful of times. This is one thing that’s a bit cleaner in Python.