ModelOriented / DALEXtra

Extensions for the DALEX package
https://ModelOriented.github.io/DALEXtra/
66 stars 10 forks source link

plot() gives result for ALL target classes, we need only 1 #51

Closed AtharKharal closed 4 years ago

AtharKharal commented 4 years ago

In a multiclass usecase, one needs BreakDown profile for only one target class, however plot() produces profiles for ALL target classes. How to control plot() to show only required target class result. Here is a dummy code similar to my actual usecase. I need to see BD plot for "DF lrnr exp.alpha" only and two other should not be shown. This kind of selection is needed where number of target classes and/or number of variables is large. Here is the dummy code: library(DALEX) library(DALEXtra) library(tidyverse) library("mlr3verse") df=data.frame(w=c(34,65,23,78,37, 34,65,23,78,37, 34,65,23,78,37, 34,65,23,78,37), x=c('a','b','a','c','c', 'a','b','a','c','c', 'a','b','a','c','c', 'a','b','a','c','c'), y=c(TRUE,FALSE,TRUE,TRUE,FALSE, TRUE,FALSE,TRUE,TRUE,FALSE, TRUE,FALSE,TRUE,TRUE,FALSE, TRUE,FALSE,TRUE,TRUE,FALSE), z=c('alpha','alpha','delta','delta','phi', 'alpha','alpha','delta','delta','phi', 'alpha','alpha','delta','delta','phi', 'alpha','alpha','delta','delta','phi') ) df_task <- TaskClassif$new(id = "my_df", backend = df, target = "z") df_lrn <- lrn("classif.rpart", predict_type = "prob") df_lrn$train(df_task) df_lrn_exp <- explain_mlr3(df_lrn, data = df[,-4], y = df$z, label = "DF lrnr exp") df_BD <- predict_parts(df_lrn_exp, df[3,], type='break_down') plot(df_BD, max_features = 5, add_contributions = T)

maksymiuks commented 4 years ago

Hi, thank You for addressing that problem, we will take a closer look at it. Meanwhile here is a fast solution: plot(df_BD[df_BD$label == "DF lrnr exp.alpha",], max_features = 5, add_contributions = T) Maybe it is not very elegant, but it is the only one right now.

AtharKharal commented 4 years ago

Thanks indeed. Your suggestion did work for the dummay code above, however for my actual project it produced the error (and the Traceback info):

Error in if (any(params$y != 0)) { : missing value where TRUE/FALSE needed

  1. f(..., self = self)
  2. self$position$compute_layer(data, params, layout)
  3. f(..., self = self)
  4. l$compute_position(d, layout)
  5. f(l = layers[[i]], d = data[[i]])
  6. by_layer(function(l, d) l$compute_position(d, layout))
  7. ggplot_build.ggplot(x)
  8. ggplot_build(x)
  9. print.ggplot(x)
  10. (function (x, ...) UseMethod("print"))(x)

Any help here please?

My sessionInfo is given below:

R version 3.6.1 (2019-07-05) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 18362)

Matrix products: default

locale: [1] LC_COLLATE=English_United States.1252 LC_CTYPE=English_United States.1252
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C
[5] LC_TIME=English_United States.1252

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] DALEXtra_1.3 DALEX_1.3.0 mlr3verse_0.1.1 paradox_0.2.0 mlr3viz_0.1.1.9002 [6] mlr3tuning_0.1.2 mlr3pipelines_0.1.3 mlr3learners_0.2.0 mlr3filters_0.2.0 mlr3db_0.1.5
[11] mlr3_0.3.0 forcats_0.5.0 stringr_1.4.0 dplyr_1.0.0 purrr_0.3.4
[16] readr_1.3.1 tidyr_1.1.0 tibble_3.0.1 ggplot2_3.3.1 tidyverse_1.2.1

loaded via a namespace (and not attached): [1] Rcpp_1.0.4.6 lubridate_1.7.4 lattice_0.20-38 assertthat_0.2.1 glmnet_4.0 digest_0.6.25
[7] packrat_0.5.0 foreach_1.5.0 ranger_0.11.2 R6_2.4.1 cellranger_1.1.0 backports_1.1.7
[13] httr_1.4.0 pillar_1.4.4 rlang_0.4.6 uuid_0.1-4 readxl_1.3.1 rstudioapi_0.11
[19] data.table_1.12.8 Matrix_1.2-17 checkmate_2.0.0 reticulate_1.16 munsell_0.5.0 broom_0.5.6
[25] compiler_3.6.1 modelr_0.1.4 pkgconfig_2.0.3 shape_1.4.4 tidyselect_1.1.0 gridExtra_2.3
[31] lgr_0.3.4 mlr3misc_0.2.0 codetools_0.2-16 fansi_0.4.1 crayon_1.3.4 withr_2.2.0
[37] rappdirs_0.3.1 MASS_7.3-51.4 grid_3.6.1 nlme_3.1-140 jsonlite_1.6.1 gtable_0.3.0
[43] lifecycle_0.2.0 magrittr_1.5 scales_1.1.1 cli_2.0.2 stringi_1.4.6 iBreakDown_1.2.0 [49] xml2_1.2.1 ellipsis_0.3.1 ggdendro_0.1-20 generics_0.0.2 vctrs_0.3.0 iterators_1.0.12 [55] tools_3.6.1 glue_1.4.1 hms_0.5.3 colorspace_1.4-1 rvest_0.3.4 haven_2.3.1

maksymiuks commented 4 years ago

It's hard to track down the problem without actual objects but I will try.

Could You please provide me with unique(BD_object$label)? And by BD_object I mean predict_parts output for Your data

AtharKharal commented 4 years ago

Yes I have predict_parts() object namely, Bumps_BD obtained as follows: Bumps_BD <- predict_parts(explainer = ranger_explainer,new_observation = Bumps_type)

Also following:

unique(Bumps_BD$label) [1] RF.Common_Other RF.Bumps RF.K_Scratch RF.Z_Scratch RF.Pastry RF.Stains
[7] RF.Dirtiness

maksymiuks commented 4 years ago

And plot(Bumps_BD[Bumps_BD$label == "RF.Common_Other",]) causes an error, right?

AtharKharal commented 4 years ago

Yes the same error as noted above

maksymiuks commented 4 years ago

Could You send me .RData file with Bumps_BD data frame via e-mail sz.maksymiuk@gmail.com ? Without that, I can only blindly say some solution and it may take ages.

AtharKharal commented 4 years ago

Ok I shall send. Let me prepare it. Thanks

maksymiuks commented 4 years ago

It took me a while but the solution turned up to be prosaic. You need to remove unnecessary factor levels, I don't know why did it work in dummy example without it.

Bumps_BD <- predict_parts(explainer = ranger_explainer,new_observation = Bumps_type)
tmp <- Bumps_BD[Bumps_BD$label == "RF.Bumps",]
tmp$label <- factor(tmp$label)
plot(tmp)

it should work now

AtharKharal commented 4 years ago

Thanks indeed. It is working now. One more favour please. I am looking for a Postdoc in Data Sc. Is there any opportunity available around? Regards

Dr. Athar Kharal https://pk.linkedin.com/in/atharkharal Cell & WhatsApp: *0092 323 7263699 *Skype: atharkharal

On Mon, 8 Jun 2020 at 14:49, maksymiuks notifications@github.com wrote:

It took me a while but the solution turned up to be prosaic. You need to remove unnecessary factor levels, I don't know why did it work in dummy example without it.

Bumps_BD <- predict_parts(explainer = ranger_explainer,new_observation = Bumps_type) tmp <- Bumps_BD[Bumps_BD$label == "RF.Bumps",] tmp$label <- factor(tmp$label) plot(tmp)

it should work now

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ModelOriented/DALEXtra/issues/51#issuecomment-640497650, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEU7U5CWAGKXJRVSHFB4VPLRVSXZRANCNFSM4NVL6JDA .