bgreenwell / fastshap

Fast approximate Shapley values in R
https://bgreenwell.github.io/fastshap/
112 stars 18 forks source link

Error in if (ncol(res) == 1) { : argument is of length zero #51

Open marboe123 opened 1 year ago

marboe123 commented 1 year ago

Hello,

I receive the error above after I use this command:

shap_values <- fastshap::explain(model_n, X = trainval, exact = TRUE)

model_n is a multiclassification model fitted with xgboost. trainval is my train-data

When I run this set-up for binary classification, the shap values are calculated correctly. When I run this set-up for multiclass classification, the error above is generated.

Do you have any idea what can be the cause?

Thanks a lot!

marboe123 commented 1 year ago

ps:

If I use the package SHAPforxgboost to calculate the shap values, I also receive an error:

Error in `colnames<-`(`*tmp*`, value = c(colnames(X_train), "BIAS")) : 
  attempt to set 'colnames' on an object with less than two dimensions

There are several other users who have the same error as well as can be seen at the bottem of this post:

https://liuyanguu.github.io/post/2019/07/18/visualization-of-shap-for-xgboost/

And here:

https://github.com/liuyanguu/SHAPforxgboost/issues/35

Maybe this could be an indication of the rootcause of the error.

Thank you.

bgreenwell commented 1 year ago

@marboe123 Can you show what the output is from calling predict(mymodel, newdata = data, predcontrib = TRUE)? I suspect XGBoost returns a list or an array (one element for each class) in the multiclass case. If so, should be a simple fix.

marboe123 commented 1 year ago

@bgreenwell thank you for your response! The output is : image

If I look into the lists I see BIAS, BIAS.1, BIAS.2 and similar for my variables: x, x.1, x.2 etc.

Are these the shap values I can use if the shap values at itself are sufficient for me?

bgreenwell commented 1 year ago

Yes! I’ll fix the package to account for multi class models. But here you have a list with one component of Shapley values for each of your three class outcomes. In the binary case you really only need one. I’ll leave this issue open until I can get a fix, but those are exactly what you’re looking for!

marboe123 commented 1 year ago

That is great. I have already been testing with predict(mymodel, newdata = data, predcontrib = TRUE). Initially I did xgboost hyperparameter tuning on binary classification with fastshap-shapvalues with a 6GB GPU. This resulted in out of memory errors sometimes. Next I moved to a 24GB GPU and I had no memory errors anymore. Currently I changed to multiclass with shapvalues based on predict(mymodel, newdata = data, predcontrib = TRUE) and I do experience the memory limit error again. Do you think it is possible that fastshap does need less memory than the predict method or is this not possible? I will test my multiclass script without shapvalues as well to see if it will run without memory errors. Thanks!