Closed GuGuaTT closed 10 months ago
Hi. It seems the prediction function does not provide the right dimension as output. Have you created a custom prediction function? To assist you further, please provide a minimal reproducible example with the failing code.
Hello! Thank you for your prompt reply! My test code is here,
# 'all' is a (1494, 20) table, with the first 19 dimensions as features and the last dimension as output values
all <- data.table::as.data.table(cbind(as.matrix(data), as.matrix(output)))
names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
"k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
colnames(all) <- names
# Assign inputs and outputs
x_var <- names[1:19]
y_var <- names[20]
x_train <- as.matrix(all[ , ..x_var])
y_train <- all[ , get(y_var)]
# Fit a basic xgboost model
model <- xgboost(
data = x_train,
label = y_train,
nround = 50,
verbose = FALSE
)
$ Visualize results
pred <- predict(model, x_train)
# Specify the expected prediction without any features and setup explainer
p0 <- mean(y_train)
explainer <- shapr(x_train, model, n_combinations=10000)
# Test with the first 10 training data
test <- x_train[1:10, ]
explain <- explain(test, explainer = explainer,
approach = "empirical",
prediction_zero = p
)
Then I got the error I mentioned.
Hello! As I cannot solve the problem above, I have tested with your new r package, but a new problem occurred. I think we can turn to this problem instead. My code is this,
data2 <- data.table::as.data.table(cbind(as.matrix(data2), as.matrix(output)))
names <- c("a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
"k", "l", "m", "n", "o", "p", "q", "r", "s", "t")
colnames(data2) <- names
x_var2 <- names[1:9]
y_var2 <- names[20]
x_train2 <- data2[ , ..x_var2]
y_train2 <- data2[ , get(y_var2)]
x_explain2 <- x_train2[1:5, ]
# Fitting a basic xgboost model to the training data
model2 <- xgboost(
data = as.matrix(x_train2),
label = y_train2,
nround = 100,
verbose = FALSE
)
# Specifying the expected prediction without any features
p02 <- mean(y_train2)
explanation2 <- explain(
model = model2,
x_explain = as.matrix(x_explain2),
x_train = as.matrix(x_train2),
approach = "gaussian",
prediction_zero = p02,
n_combinations = NULL
)
print(explanation2$shapley_values)
The problem is, if I only use eight features to build this prediction model, (x_var2 <- names[1:8]), then the code can pass. However, if I use any feature number above or equal to 9 (x_var2 <- names[1:9]), the code cannot pass with error,
Error in setnames(x, value) : Can't assign 15011 names to a 6 column data.table
This error occurs when I execute the explain command. Could you take a look? I am happy to provide you the data if you want. (BTW, I have just read the paper related to this package these days and it is really wonderful, thank you!)
Hi!
I took a look now, reproduced your issue and found that the issue is that you are using the feature name "i". We should fix this, but for now, a simple workaround is to not use "i" as feature name. This is also the case of "w" if you ever increase the number of features in your model.
Note to self: Introduce a check for protected feature names ("i", "w","p_hat", "id", "id_combination", etc. and temporary transform the feature names if any of these appears as feature names.)
Thank you! It fixes the problem!
Hi! I got this error when I use "explain",
Error in prediction(dt, prediction_zero, explainer) : nrow(explainer$x_test) == dt[, max(id)] is not TRUE
What does it mean?