bgreenwell / fastshap

Fast approximate Shapley values in R
https://bgreenwell.github.io/fastshap/
112 stars 18 forks source link

Subscript `O` is a matrix, the data `X[O]` must have size 1. #37

Closed jgarrigan closed 2 years ago

jgarrigan commented 2 years ago

Hi Brandon,

Thanks for the great explanation here. I'm using isotree for outlier detection and this has been super useful.

I'm trying to run your example code but unfortunately I am getting the following error when I run your code verbatim, has something changed under the hood?:

ex <- fastshap::explain(ifo, X = X, newdata = max.x, pred_wrapper = pfun, 
+                          adjust = TRUE, nsim = 1000)
Error:
! Subscript `O` is a matrix, the data `X[O]` must have size 1.
Run `rlang::last_error()` to see where the error occurred.
> rlang::last_error()
<error/tibble_error_subset_matrix_must_be_scalar>
Error:
! Subscript `O` is a matrix, the data `X[O]` must have size 1.
---
Backtrace:
  1. fastshap::explain(...)
  2. fastshap:::explain.default(...)
  3. fastshap:::explain.default(...)
  4. plyr::laply(...)
  5. plyr::llply(...)
  7. base::lapply(pieces, .fun, ...)
  8. fastshap FUN(X[[i]], ...)
  9. base::replicate(...)
 10. base::sapply(...)
 11. base::lapply(X = X, FUN = FUN, ...)
 12. fastshap FUN(X[[i]], ...)
 13. fastshap:::explain_column(...)
 15. tibble:::`[<-.tbl_df`(`*tmp*`, O, value = `<dbl>`)
 16. tibble:::tbl_subassign_matrix(x, j, value, j_arg, substitute(value))
Run `rlang::last_trace()` to see the full context.

The only change I made to the code was removing the reference to random_seed = 2223 when fitting the isolation tree model, this doesn't appear to be an argument within the isolation.forest function: https://cran.r-project.org/web/packages/isotree/isotree.pdf

jgarrigan commented 2 years ago

I checked the class of X and max.x and decided to wrap both of them in as.data.frame and I now have results and can plot the Shapley values using ggplot.

Thanks again for this example, this has really saved me