ModelOriented / randomForestExplainer

A set of tools to understand what is happening inside a Random Forest
https://ModelOriented.github.io/randomForestExplainer/
230 stars 37 forks source link

Question about Modification of Plot of Interactions #21

Closed ANUGAR closed 4 years ago

ANUGAR commented 4 years ago

Hello, I am working with the function plot_min_depth_interactions() of your package (see section Variable Interactions here https://cran.rstudio.com/web/packages/randomForestExplainer/vignettes/randomForestExplainer.html), and I would like to modify the output. As you know, this function creates a (very interesting) chart of interactions using as default the names of the variables of the dataframes. In this case, I am trying to modify the names I used in the dataset by using labels that are more appropriate for presentation (say for example that I have an interaction that appears as "reg_geo:educ_lev" that I would like to change to "Region:Education" for the effects of the chart, but not for the rest of my code). I was trying to use scale_x_discrete of ggplot, but I am lost: how am I supposed to do this? Should I input all variable names and its labels or just those included in the interaction chart? What order should I follow (that of the most important interactions)? Can you guide me, please? Moreover, can I change colors? I was trying with scale_fill_brewer, but it tells me that I am using a continuous value for a discrete variable. Thanks for your help.

Yue-Jiang commented 4 years ago

Sorry for the delay in response. Two ways I can think of:

  1. work with the interaction table directly, then you have all the freedom to customize your plot. For example

    library(randomForestExplainer)
    library(randomForest)
    set.seed(12345)
    rf <- randomForest(Species ~ Petal.Length + Sepal.Length, data = iris, localImp = T)
    interaction_df <- min_depth_interactions(rf)
    print(interaction_df)

    gives you this data.frame and you can mutate the name of the interaction whatever way you want and plot the relevant information.

      variable root_variable mean_min_depth occurrences               interaction uncond_mean_min_depth
    1 Petal.Length  Petal.Length      0.9644747         413 Petal.Length:Petal.Length                 0.638
    2 Petal.Length  Sepal.Length      0.3406593         455 Sepal.Length:Petal.Length                 0.638
    3 Sepal.Length  Petal.Length      1.0240264         401 Petal.Length:Sepal.Length                 0.838
    4 Sepal.Length  Sepal.Length      0.4481231         441 Sepal.Length:Sepal.Length                 0.838
  2. manipulate the ggplot object. This is going to be hacky, but you can change the ticks label and colors as such:

    p <- plot_min_depth_interactions(rf)
    print(p)

    image you can customize it like a regular ggplot2 object:

    p +
    scale_x_discrete(labels=c("SL:PL", "SL:SL", "PL:PL", "PL:SL")) + # change x ticks label
    scale_fill_gradient(low = "grey90", high = "red") # change color

    image

ANUGAR commented 4 years ago

Hello, Thank you for the answer. I'm sorry for being so picky about the charts, but I'm writing something where the smallest detail becomes important. I managed to make the changes, and the chart looks really good. I would like, however, to ask you two more questions if you do not mind me:

  1. What is the meaning of the "unconditional" lines of the interaction chart? I don't understand at all the information they provide and why it is important
  2. Do you know how I can manage to change the format of the red line "minimum" on the chart? I would like to make it black due to a color choice and with a dashed line if possible.

Thanks again for all the help and I'm sorry for all the trouble. I appreciate a lot your help!

On Sat, Apr 4, 2020 at 6:22 AM Yue Jiang notifications@github.com wrote:

Sorry for the delay in response. Two ways I can think of:

  1. work with the interaction table directly, then you have all the freedom to customize your plot. For example

library(randomForestExplainer) library(randomForest) set.seed(12345) rf <- randomForest(Species ~ Petal.Length + Sepal.Length, data = iris, localImp = T) interaction_df <- min_depth_interactions(rf) print(interaction_df)

gives you this data.frame and you can mutate the name of the interaction whatever way you want and plot the relevant information.

  variable root_variable mean_min_depth occurrences               interaction uncond_mean_min_depth

1 Petal.Length Petal.Length 0.9644747 413 Petal.Length:Petal.Length 0.638 2 Petal.Length Sepal.Length 0.3406593 455 Sepal.Length:Petal.Length 0.638 3 Sepal.Length Petal.Length 1.0240264 401 Petal.Length:Sepal.Length 0.838 4 Sepal.Length Sepal.Length 0.4481231 441 Sepal.Length:Sepal.Length 0.838

  1. manipulate the ggplot object. This is going to be hacky, but you can change the ticks label and colors as such:

p <- plot_min_depth_interactions(rf) print(p)

[image: image] https://user-images.githubusercontent.com/14319630/78418214-046b5d00-75ef-11ea-9d27-3293b03da991.png you can customize it like a regular ggplot2 object:

p + scale_x_discrete(labels=c("SL:PL", "SL:SL", "PL:PL", "PL:SL")) + # change x ticks label scale_fill_gradient(low = "grey90", high = "red") # change color

[image: image] https://user-images.githubusercontent.com/14319630/78418445-f1f22300-75f0-11ea-8118-412238c29d6a.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ModelOriented/randomForestExplainer/issues/21#issuecomment-608970975, or unsubscribe https://github.com/notifications/unsubscribe-auth/ANW3Q6VXYE3YKPOI32RNF73RK2YYBANCNFSM4LQJK2TQ .

-- "Not all who wander are lost"- J. R. R. Tolkien

Yue-Jiang commented 4 years ago
  1. Great question. Unconditional depth here is used to provide a context of how big this interaction effect is. Let's say we have an interaction x:y. Now the bar plot shows on average, after splitting by x, how many splits it takes to reach a split by y. And a small value suggests strong "interaction" between x and y, because once split on x the trees tend to split on y. But as you can see there's a problem here, what if y is a pretty important variable and it just tend to be closer to the top of trees? Unconditional depth just shows that, it is the average of minimal depths of variable y itself, unconditional on x. You can check it for yourself (reusing the above example):

    library(dplyr)
    min_depth_distribution(rf) %>%
    group_by(variable) %>%
    summarise(uncond_mean_min_depth = mean(minimal_depth))
    # A tibble: 2 x 2
    variable     uncond_mean_min_depth
    <chr>                        <dbl>
    1 Petal.Length                 0.638
    2 Sepal.Length                 0.838

    So in order to say an interaction is big, you'll want the conditional depth to be much smaller than the unconditional depth.

  2. This is more of a question on how to manipulate ggplot2 plots after they are generated, which you'll probably find the answer faster on stackoverflow. Here's one way you can do it, but I think you should really make your own ggplot based on the interaction data.frame as I mentioned earlier.

    # override legend
    p <- p + guides(linetype = guide_legend(override.aes = list(color = "black", linetype = 2, size = 0.5))) # reducing line size so we can see it's actually dashed. otherwise by default it's size = 1.5 and only shows one segment on the legend
    # manipulate plot
    pb <- ggplot_build(p)
    pb$data[[3]]$colour <- "black"
    pb$data[[3]]$linetype <- 2
    pb$data[[3]]$size <- 0.5
    gt <- ggplot_gtable(pb)
    plot(gt)

    image

ANUGAR commented 4 years ago

Thanks a lot for your answers. I really appreciate the help!

Le dim. 5 avr. 2020 à 23:29, Yue Jiang notifications@github.com a écrit :

  1. Great question. Unconditional depth here is used to provide a context of how big this interaction effect is. Let's say we have an interaction x:y. Now the bar plot shows on average, after splitting by x, how many splits it takes to reach a split by y. And a small value suggests strong "interaction" between x and y, because once split on x the trees tend to split on y. But as you can see there's a problem here, what if y is a pretty important variable and it just tend to be closer to the top of trees? Unconditional depth just shows that, it is the average of minimal depths of variable y itself, unconditional on x. You can check it for yourself (reusing the above example):

library(dplyr) min_depth_distribution(rf) %>% group_by(variable) %>% summarise(uncond_mean_min_depth = mean(minimal_depth))

A tibble: 2 x 2

variable uncond_mean_min_depth

1 Petal.Length 0.638 2 Sepal.Length 0.838 So in order to say an interaction is big, you'll want the conditional depth to be much smaller than the unconditional depth. 1. This is more of a question on how to manipulate ggplot2 plots after they are generated, which you'll probably find the answer faster on stackoverflow. Here's one way you can do it, but I think you should really make your own ggplot based on the interaction data.frame as I mentioned earlier. # override legend p <- p + guides(linetype = guide_legend(override.aes = list(color = "black", linetype = 2, size = 0.5))) # reducing line size so we can see it's actually dashed. otherwise by default it's size = 1.5 and only shows one segment on the legend # manipulate plot pb <- ggplot_build(p) pb$data[[3]]$colour <- "black" pb$data[[3]]$linetype <- 2 pb$data[[3]]$size <- 0.5 gt <- ggplot_gtable(pb) plot(ggplot_gtable(pb)) [image: image] — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub , or unsubscribe .