AlanInglis / vivid

This package is for visualising variable importance and variable interaction.
https://alaninglis.github.io/vivid/
20 stars 2 forks source link

Issue when running vivi function with xgboost #3

Closed ghost closed 1 year ago

ghost commented 1 year ago

Hello Mr. (Dr.?) Inglis:

I am really looking forward to working with your package.

I am trying to run VIVID with a xgboost machine learning model. I am following your excellent directions. However, I am getting the following error when I try to run vivi

pFun <- function(fit, data, ...) predict(fit, as.matrix(dataset_oh_1[,2:19]))

set.seed(1701) dataset_oh <- as_tibble(dataset_oh) set.seed(1701) viviGBst <- vivi(fit = gbst,

  • data = dataset_oh,
  • response = "LOS",
  • reorder = FALSE,
  • normalized = FALSE,
  • predictFun = pFun) Agnostic variable importance method used. Calculating interactions... Error in [[<-: ! Assigned data stats::predict(x, data = X[, cols, drop = FALSE]) must be compatible with existing data. ✖ Existing data has 25000 rows. ✖ Assigned data has 10792 rows. ℹ Only vectors of size 1 are recycled. Caused by error in vectbl_recycle_rhs_rows(): ! Can't recycle input of size 10792 to size 25000. Run rlang::last_trace() to see where the error occurred.

Here is the backtrace:

  1. └─vivid::vivi(...)
    1. ├─vivid:::vividInteraction(...)
    2. └─vivid:::vividInteraction.default(...)
    3. ├─flashlight::light_interaction(...)
    4. └─flashlight:::light_interaction.flashlight(...)
    5. └─flashlight (local) core_func(data)
    6. └─base::lapply(v, statistic, dat = X, grid_id = grid_id)
    7. └─flashlight (local) FUN(X[[i]], ...)
    8. └─flashlight (local) call_pd(dat, z = z, gid = grid_id)
  2. ├─base::[[<-(*tmp*, vn, value = <dbl>)
  3. └─tibble:::[[<-.tbl_df(*tmp*, vn, value = <dbl>)
  4. └─tibble:::tbl_subassign(...)
  5. └─tibble:::vectbl_recycle_rhs_rows(value, fast_nrow(xo), i_arg = NULL, value_arg, call)

Someone on Stackoverflow has the same issue, but I can't find it. Sorry.

ghost commented 1 year ago

Here is the stackoverflow issue:

https://stackoverflow.com/questions/74338823/getting-error-while-calculating-feature-importances-r

AlanInglis commented 1 year ago

Hi,

I've tracked down the cause of the issue. It is a side effect of how the internal predict function works in vivid. When creating the predict function, you must include the term "data" instead of the actual name of the name of the data. Below is a small example of how to recreate and solve the issue.

#Load libraries:
library(vivid)
library(xgboost)

#Load data:
set.seed(1701)
aq <- na.omit(airquality)

#Create gbm:
gbst <- xgboost(
  data = as.matrix(aq[, 1:6]),
  label = as.matrix(aq[, 1]),
  nrounds = 100,
  verbose = 0
)

#Create predict function:
pFun <- function(fit, data, ...) predict(fit, as.matrix(data[, 1:6]))

#NOTE: In the predict function above, you must use the term 'data' in the function 
#argument (i.e., do not use the actual name of the data!). You must also include
#the response variable.

#Run vivid:
set.seed(1701)
viviGBst <- vivi(
  fit = gbst,
  data = aq,
  response = "Ozone",
  reorder = FALSE,
  normalized = FALSE,
  predictFun = pFun
)

#Output:
viviGBst

#-------------------------------------------------------------------------

#Recreating the error:
pFun <- function(fit, data, ...) predict(fit, as.matrix(aq[, 1:6])) # used 'aq' instead of 'data'

#Run vivid:
viviGBst <- vivi(
  fit = gbst,
  data = aq,
  response = "Ozone",
  predictFun = pFun
)

Hopefully that solves the problem. But let me know if you're still having any issues!

ghost commented 1 year ago

Mr. (Dr) Inglis:

THANK YOU For the very quick response! I am grateful for your help.

I will run this code soon and share results.

Take care,

Brian

ghost commented 1 year ago

Thank you for your help.

I've included the plot VIVID made with your help.