Closed Nehagupta90 closed 2 years ago
Hi, can you provide a reproducible example from loading the data through creating the model to obtaining the error? E.g. you could train your mlr3
model on the example u mentioned:
titanic_imputed <- archivist::aread("pbiecek/models/27e5c")
titanic_rf <- # mlr3 model based on titanic_imputed
johnny_d <- archivist:: aread("pbiecek/models/e3596")
library("randomForest")
library("DALEX")
titanic_rf_exp <- DALEX::explain(model = titanic_rf,
data = titanic_imputed[, -9],
y = titanic_imputed$survived == "yes",
label = "Random Forest")
set.seed(1)
library("DALEXtra")
library("lime")
model_type.dalex_explainer <- DALEXtra::model_type.dalex_explainer
predict_model.dalex_explainer <- DALEXtra::predict_model.dalex_explainer
lime_johnny <- predict_surrogate(explainer = titanic_rf_exp,
new_observation = johnny_d,
n_features = 3,
n_permutations = 1000,
type = "lime")
(as.data.frame(lime_johnny))
plot(lime_johnny)
The following code I am using, which works with Breakdown method:
data = readARFF("xalan.arff") index= sample(1:nrow(data), 0.7*nrow(data)) train= data[index,] test= data[-index,] task = TaskRegr$new("data", backend = train, target = "bug")
print(task)
learner=lrn("regr.ksvm")
model= learner$train(task )
explainer2 = explain_mlr3(model, data = test[,-21], y = as.numeric(test$bug)-1, label="SVM")
new_observation= test[36,]
plot(predict_parts(explainer2, new_observation = new_observation, type = "break_down_interactions"))
model_type.dalex_explainer <- DALEXtra::model_type.dalex_explainer predict_model.dalex_explainer <- DALEXtra::predict_model.dalex_explainer
lime_johnny <- predict_surrogate(explainer = explainer2, new_observation = new_observation, n_features = 3, n_permutations = 1000, type = "lime")
Libraries I used are the following
library(farff) library(mlr3learners) library(mlr3extralearners) library(mlr3) library(DALEX) library(DALEXtra)
library(lime) library(ingredients) library(ceterisParibus)
On Mon, Mar 14, 2022 at 5:26 PM Hubert Baniecki @.***> wrote:
Hi, can you provide a reproducible example from loading the data through creating the model to obtaining the error? E.g. you could train your mlr3 model on the example u mentioned:
titanic_imputed <- archivist::aread("pbiecek/models/27e5c")titanic_rf <- # mlr3 model based on titanic_imputedjohnny_d <- archivist:: aread("pbiecek/models/e3596")
library("randomForest") library("DALEX")titanic_rf_exp <- DALEX::explain(model = titanic_rf, data = titanic_imputed[, -9], y = titanic_imputed$survived == "yes", label = "Random Forest")
set.seed(1) library("DALEXtra") library("lime")model_type.dalex_explainer <- DALEXtra::model_type.dalex_explainerpredict_model.dalex_explainer <- DALEXtra::predict_model.dalex_explainer lime_johnny <- predict_surrogate(explainer = titanic_rf_exp, new_observation = johnny_d, n_features = 3, n_permutations = 1000, type = "lime")
(as.data.frame(lime_johnny))
plot(lime_johnny)
— Reply to this email directly, view it on GitHub https://github.com/ModelOriented/DALEX/issues/487#issuecomment-1067029586, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN2ZZ2JVQEZQFQUBOKMZLJLU75SE3ANCNFSM5QSBQ6UQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
Hi, this code works for me, so unless you share data or provide a reproducible example, I might not be able to help you.
library(mlr3learners)
library(mlr3extralearners)
library(mlr3)
library(DALEX)
library(DALEXtra)
library(lime)
index= sample(1:nrow(titanic_imputed), 0.7*nrow(titanic_imputed))
train= titanic_imputed[index,]
test= titanic_imputed[-index,]
task = TaskRegr$new("data", backend = train, target = "survived")
print(task)
learner=lrn("regr.ksvm")
model= learner$train(task )
explainer2 = explain_mlr3(model,
data = test[,-21],
y = as.numeric(test$survived),
label="SVM")
new_observation= test[36,]
### The following works with Breakdown
plot(predict_parts(explainer2,
new_observation = new_observation,
type = "break_down_interactions"))
## The following WORKS
model_type.dalex_explainer <- DALEXtra::model_type.dalex_explainer
predict_model.dalex_explainer <- DALEXtra::predict_model.dalex_explainer
lime_johnny <- predict_surrogate(explainer = explainer2,
new_observation = new_observation,
n_features = 3,
n_permutations = 1000,
type = "lime")
plot(lime_johnny)
You can also try to update all the used libraries. My session info:
R version 4.1.1 (2021-08-10)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale:
[1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252
[3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C
[5] LC_TIME=English_United Kingdom.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] lime_0.5.2 DALEXtra_2.1.1 DALEX_2.4.0 mlr3extralearners_0.5.18
[5] mlr3learners_0.5.1 mlr3_0.13.0
loaded via a namespace (and not attached):
[1] Rcpp_1.0.7 paradox_0.7.1 lubridate_1.8.0 lattice_0.20-44
[5] listenv_0.8.0 png_0.1-7 palmerpenguins_0.1.0 assertthat_0.2.1
[9] glmnet_4.1-3 digest_0.6.29 foreach_1.5.1 utf8_1.2.2
[13] parallelly_1.28.1 R6_2.5.1 backports_1.4.1 RSQLite_2.2.9
[17] httr_1.4.2 ggplot2_3.3.5 pillar_1.7.0 flock_0.7
[21] rlang_1.0.1 uuid_0.1-4 rstudioapi_0.13 data.table_1.14.2
[25] kernlab_0.9-29 blob_1.2.2 Matrix_1.3-4 checkmate_2.0.0
[29] reticulate_1.22 labeling_0.4.2 splines_4.1.1 gower_0.2.2
[33] RCurl_1.98-1.5 bit_4.0.4 munsell_0.5.0 compiler_4.1.1
[37] pkgconfig_2.0.3 shape_1.4.6 globals_0.14.0 tidyselect_1.1.1
[41] tibble_3.1.6 lgr_0.4.3 mlr3misc_0.9.5 codetools_0.2-18
[45] fansi_1.0.2 future_1.23.0 crayon_1.5.0 dplyr_1.0.7
[49] bitops_1.0-7 rappdirs_0.3.3 grid_4.1.1 jsonlite_1.8.0
[53] gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.2 magrittr_2.0.2
[57] scales_1.1.1 archivist_2.3.6 stringi_1.7.6 cli_3.2.0
[61] cachem_1.0.6 farver_2.1.0 iBreakDown_2.0.1 ellipsis_0.3.2
[65] generics_0.1.1 vctrs_0.3.8 iterators_1.0.13 tools_4.1.1
[69] bit64_4.0.5 glue_1.6.1 purrr_0.3.4 survival_3.2-13
[73] parallel_4.1.1 fastmap_1.1.0 colorspace_2.0-3 memoise_2.0.0
Another thing, how the above code works for you as you used the titanic dataset. The output variable of the titanic dataset is classification-based while I used regression-based learner. Doesn't it matter?
On Mon, Mar 14, 2022 at 5:54 PM Neha gupta @.***> wrote:
How can I share my data? Can I attach the dataset?
On Mon, Mar 14, 2022 at 5:49 PM Hubert Baniecki @.***> wrote:
Hi, this code works for me, so unless you share data or provide a reproducible example, I might not be able to help you.
library(mlr3learners) library(mlr3extralearners) library(mlr3) library(DALEX) library(DALEXtra)
library(lime) index= sample(1:nrow(titanic_imputed), 0.7*nrow(titanic_imputed))train= titanic_imputed[index,]test= titanic_imputed[-index,]task = TaskRegr$new("data", backend = train, target = "survived")
print(task) learner=lrn("regr.ksvm") model= learner$train(task ) explainer2 = explain_mlr3(model, data = test[,-21], y = as.numeric(test$survived), label="SVM") new_observation= test[36,]
The following works with Breakdown
plot(predict_parts(explainer2, new_observation = new_observation, type = "break_down_interactions"))
The following WORKS
model_type.dalex_explainer <- DALEXtra::model_type.dalex_explainerpredict_model.dalex_explainer <- DALEXtra::predict_model.dalex_explainer lime_johnny <- predict_surrogate(explainer = explainer2, new_observation = new_observation, n_features = 3, n_permutations = 1000, type = "lime") plot(lime_johnny)
You can also try to update all the used libraries. My session info:
R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] lime_0.5.2 DALEXtra_2.1.1 DALEX_2.4.0 mlr3extralearners_0.5.18 [5] mlr3learners_0.5.1 mlr3_0.13.0
loaded via a namespace (and not attached): [1] Rcpp_1.0.7 paradox_0.7.1 lubridate_1.8.0 lattice_0.20-44 [5] listenv_0.8.0 png_0.1-7 palmerpenguins_0.1.0 assertthat_0.2.1 [9] glmnet_4.1-3 digest_0.6.29 foreach_1.5.1 utf8_1.2.2 [13] parallelly_1.28.1 R6_2.5.1 backports_1.4.1 RSQLite_2.2.9 [17] httr_1.4.2 ggplot2_3.3.5 pillar_1.7.0 flock_0.7 [21] rlang_1.0.1 uuid_0.1-4 rstudioapi_0.13 data.table_1.14.2 [25] kernlab_0.9-29 blob_1.2.2 Matrix_1.3-4 checkmate_2.0.0 [29] reticulate_1.22 labeling_0.4.2 splines_4.1.1 gower_0.2.2 [33] RCurl_1.98-1.5 bit_4.0.4 munsell_0.5.0 compiler_4.1.1 [37] pkgconfig_2.0.3 shape_1.4.6 globals_0.14.0 tidyselect_1.1.1 [41] tibble_3.1.6 lgr_0.4.3 mlr3misc_0.9.5 codetools_0.2-18 [45] fansi_1.0.2 future_1.23.0 crayon_1.5.0 dplyr_1.0.7 [49] bitops_1.0-7 rappdirs_0.3.3 grid_4.1.1 jsonlite_1.8.0 [53] gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.2 magrittr_2.0.2 [57] scales_1.1.1 archivist_2.3.6 stringi_1.7.6 cli_3.2.0 [61] cachem_1.0.6 farver_2.1.0 iBreakDown_2.0.1 ellipsis_0.3.2 [65] generics_0.1.1 vctrs_0.3.8 iterators_1.0.13 tools_4.1.1 [69] bit64_4.0.5 glue_1.6.1 purrr_0.3.4 survival_3.2-13 [73] parallel_4.1.1 fastmap_1.1.0 colorspace_2.0-3 memoise_2.0.0
— Reply to this email directly, view it on GitHub https://github.com/ModelOriented/DALEX/issues/487#issuecomment-1067054211, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN2ZZ2JTOK45DWKUY7D5573U75U3DANCNFSM5QSBQ6UQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
I attached the dataset
Doesn't it matter?
You can do a regression into a 0-1 variable; it is just a technicality.
Ok, I ran your example and the error can be fixed by using:
new_observation = test[36, -21] # target variable breaks the code
Target variable in new_observation
breaks the following line in predict_surrogate()
:
Thanks a lot, Hebert.
It now works. Just a warning message
Warning message: noc does not contain enough variance to use quantile binning. Using standard binning instead.
On Mon, Mar 14, 2022 at 6:27 PM Hubert Baniecki @.***> wrote:
Doesn't it matter?
You can do a regression into a 0-1 variable; it is just a technicality.
Ok, I ran your example and the error can be fixed by using:
new_observation = test[36, -21] # target variable breaks the code
Target variable in new_observation breaks the following line in predict_surrogate():
— Reply to this email directly, view it on GitHub https://github.com/ModelOriented/DALEX/issues/487#issuecomment-1067093636, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN2ZZ2MZEHVTCTFQO3LCWF3U75ZILANCNFSM5QSBQ6UQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
Hello Hubert
When we use the Breakdown method, we use something the following figure. The values next to the input metrics are the values of this particular instance. However, what represents the values next to metrics produced by the Lime method? It shows values with < and > signs. What does it represent?
[image: image.png]
On Mon, Mar 14, 2022 at 6:35 PM Neha gupta @.***> wrote:
Thanks a lot, Hebert.
It now works. Just a warning message
Warning message: noc does not contain enough variance to use quantile binning. Using standard binning instead.
On Mon, Mar 14, 2022 at 6:27 PM Hubert Baniecki @.***> wrote:
Doesn't it matter?
You can do a regression into a 0-1 variable; it is just a technicality.
Ok, I ran your example and the error can be fixed by using:
new_observation = test[36, -21] # target variable breaks the code
Target variable in new_observation breaks the following line in predict_surrogate():
— Reply to this email directly, view it on GitHub https://github.com/ModelOriented/DALEX/issues/487#issuecomment-1067093636, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN2ZZ2MZEHVTCTFQO3LCWF3U75ZILANCNFSM5QSBQ6UQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
Hi, quoting the EMA book (https://ema.drwhy.ai/LIME.html, Section 9.4 Example: Titanic data):
In this example, however, we have got a relatively small number of variables, so we will use a simpler data representation in the form of a binary vector. Toward this aim, each variable is dichotomized into two levels. For example, age is transformed into a binary variable with categories “≤15.36” and “>15.36”, class is transformed into a binary variable with categories “1st/2nd/deck crew” and “other”, and so on.
Hope this helps.
I will read the details in the book to further clarity things. Thank you again for your support and helpful information.
On Monday, March 14, 2022, Hubert Baniecki @.***> wrote:
Hi, quoting the EMA book (https://ema.drwhy.ai/LIME.html, Section 9.4 Example: Titanic data):
In this example, however, we have got a relatively small number of variables, so we will use a simpler data representation in the form of a binary vector. Toward this aim, each variable is dichotomized into two levels. For example, age is transformed into a binary variable with categories “≤15.36” and “>15.36”, class is transformed into a binary variable with categories “1st/2nd/deck crew” and “other”, and so on.
Hope this helps.
— Reply to this email directly, view it on GitHub https://github.com/ModelOriented/DALEX/issues/487#issuecomment-1067197992, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN2ZZ2J3YXOLEKU42ALSQ2DU76GS5ANCNFSM5QSBQ6UQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
How can I share my data? Can I attach the dataset?
On Mon, Mar 14, 2022 at 5:49 PM Hubert Baniecki @.***> wrote:
Hi, this code works for me, so unless you share data or provide a reproducible example, I might not be able to help you.
library(mlr3learners) library(mlr3extralearners) library(mlr3) library(DALEX) library(DALEXtra)
library(lime) index= sample(1:nrow(titanic_imputed), 0.7*nrow(titanic_imputed))train= titanic_imputed[index,]test= titanic_imputed[-index,]task = TaskRegr$new("data", backend = train, target = "survived")
print(task) learner=lrn("regr.ksvm") model= learner$train(task ) explainer2 = explain_mlr3(model, data = test[,-21], y = as.numeric(test$survived), label="SVM") new_observation= test[36,]
The following works with Breakdown
plot(predict_parts(explainer2, new_observation = new_observation, type = "break_down_interactions"))
The following WORKS
model_type.dalex_explainer <- DALEXtra::model_type.dalex_explainerpredict_model.dalex_explainer <- DALEXtra::predict_model.dalex_explainer lime_johnny <- predict_surrogate(explainer = explainer2, new_observation = new_observation, n_features = 3, n_permutations = 1000, type = "lime") plot(lime_johnny)
You can also try to update all the used libraries. My session info:
R version 4.1.1 (2021-08-10) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 19044)
Matrix products: default
locale: [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United Kingdom.1252 [3] LC_MONETARY=English_United Kingdom.1252 LC_NUMERIC=C [5] LC_TIME=English_United Kingdom.1252
attached base packages: [1] stats graphics grDevices utils datasets methods base
other attached packages: [1] lime_0.5.2 DALEXtra_2.1.1 DALEX_2.4.0 mlr3extralearners_0.5.18 [5] mlr3learners_0.5.1 mlr3_0.13.0
loaded via a namespace (and not attached): [1] Rcpp_1.0.7 paradox_0.7.1 lubridate_1.8.0 lattice_0.20-44 [5] listenv_0.8.0 png_0.1-7 palmerpenguins_0.1.0 assertthat_0.2.1 [9] glmnet_4.1-3 digest_0.6.29 foreach_1.5.1 utf8_1.2.2 [13] parallelly_1.28.1 R6_2.5.1 backports_1.4.1 RSQLite_2.2.9 [17] httr_1.4.2 ggplot2_3.3.5 pillar_1.7.0 flock_0.7 [21] rlang_1.0.1 uuid_0.1-4 rstudioapi_0.13 data.table_1.14.2 [25] kernlab_0.9-29 blob_1.2.2 Matrix_1.3-4 checkmate_2.0.0 [29] reticulate_1.22 labeling_0.4.2 splines_4.1.1 gower_0.2.2 [33] RCurl_1.98-1.5 bit_4.0.4 munsell_0.5.0 compiler_4.1.1 [37] pkgconfig_2.0.3 shape_1.4.6 globals_0.14.0 tidyselect_1.1.1 [41] tibble_3.1.6 lgr_0.4.3 mlr3misc_0.9.5 codetools_0.2-18 [45] fansi_1.0.2 future_1.23.0 crayon_1.5.0 dplyr_1.0.7 [49] bitops_1.0-7 rappdirs_0.3.3 grid_4.1.1 jsonlite_1.8.0 [53] gtable_0.3.0 lifecycle_1.0.1 DBI_1.1.2 magrittr_2.0.2 [57] scales_1.1.1 archivist_2.3.6 stringi_1.7.6 cli_3.2.0 [61] cachem_1.0.6 farver_2.1.0 iBreakDown_2.0.1 ellipsis_0.3.2 [65] generics_0.1.1 vctrs_0.3.8 iterators_1.0.13 tools_4.1.1 [69] bit64_4.0.5 glue_1.6.1 purrr_0.3.4 survival_3.2-13 [73] parallel_4.1.1 fastmap_1.1.0 colorspace_2.0-3 memoise_2.0.0
— Reply to this email directly, view it on GitHub https://github.com/ModelOriented/DALEX/issues/487#issuecomment-1067054211, or unsubscribe https://github.com/notifications/unsubscribe-auth/AN2ZZ2JTOK45DWKUY7D5573U75U3DANCNFSM5QSBQ6UQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.
You are receiving this because you authored the thread.Message ID: @.***>
What do you mean? this issue is fixed
I have used Break-down method for instance level explanation and does work fine. I have never used LIME method and now when I am using it, it gives me the following error:
Error in
[.data.frame
(explainer$data, , colnames(new_observation)) : undefined columns selectedMy code is:
explainer5 = explain_mlr3(model5, data = test[,-21], y = as.numeric(test$report)-1, label="SVM")
new_observation= test[6,] plot(predict_parts(explainer5, new_observation = new_observation, type = "break_down_interactions")) //// This works fine
/// Problem is in the following code
model_type.dalex_explainer <- DALEXtra::model_type.dalex_explainer predict_model.dalex_explainer <- DALEXtra::predict_model.dalex_explainer
lime_tool <- predict_surrogate(explainer = explainer5, new_observation = new_observation, n_features = 3, n_permutations = 1000, type = "lime")
Error in
[.data.frame
(explainer$data, , colnames(new_observation)) : undefined columns selectedWhat could be the problem? I am taking help from the example in https://ema.drwhy.ai/LIME.html