bupaverse / processmapR

Visualize event logs using directed graphs, i.e. process maps.
https://bupaverse.github.io/processmapR/
Other
9 stars 6 forks source link

Using specific colors per activity name in trace_explorer #38

Open dkoekkoek opened 3 years ago

dkoekkoek commented 3 years ago

Hi,

For my project I would like to give each unique activity discovered with the trace_explorer a specific assigned color. The event log consists of a column with activities and sub-activities, in order to create different levels of traces. Ideally, I want the sub-activities to be a color variant of the main activity.

For example one main activity "Place order" is blue. And subactivities describe the specific order "Scheduele appointment", "Request assistance", " etc. and I would like those sub-activities in other shades of blue.

Is there a way to assign a color to a particular activity name, and a color range to sub-activities?

Thank you in advance!

gertjanssenswillen commented 2 years ago

Hi

Sorry for the delay, hopefully the answer is still helpful.

The trace explorer plot is a ggplot object, so you can add your own scale to it as an extra layer. In this way, you can use the scale_fill_manual() function of ggplot to manually set the colors.

For example:

patients %>%
    trace_explorer(n_traces = 7) +
    scale_fill_manual(values = c("Check-out" = "blue",
                                                 "Blood test" = "red",
                              "Discuss Results" ="yellow",
                              "X-Ray" = "orange",
                              "Registration" = "green",
                              "Triage and Assessment"="purple",
                              "MRI SCAN"="brown"))

Of course, this requires that you enumerate all activities with their specific colors. I don't think there exists an out-of-the-box scale_fill function (to apply a hierarchy like this. Nevertheless, the creation of the values argument vector can be somewhat automated if you have a large number of activities.

E.g. you can start from the scales in the R color brewer. A vector of x colors from a palette t can be created with RColorBrewer::brewer.pal(n = x, name = y). In this way, you can work as follows:

Create a table with the activities, grouped on the "superactivity" > this is going to be your "main activity". I have created one here with mutate,

patients %>%
    mutate(superactivity = ifelse(handling %in% c("Registration","Check-out", "Discuss Results"), "cat 1","cat 2")) %>%
    group_by(superactivity) %>%
    activities()

You can then decide on a specific scale for each main activity. (Depending on the number, this might need some automisation. For simplicity, I am just going with Reds for cat 1 and Blues for cat 2.

    mutate(fill_scale = ifelse(superactivity == "cat 1","Reds","Blues")) %>%

Creating some helper variables: the number of colors needed within each group, as well wel as a numeric id (note that the data.frame is still grouped on the superactivity). `

    mutate(n_colors = n(), color_id = 1:n()) %>%

Then, with some purrr magic we can create the the color for each activity, by iterating over the RColorBrewer::brewer.pal function with the n_colors value as n, the fill_scale as name, and the color_id as a index-value to the resulting vector.

mutate(color = pmap_chr(list(fill_scale, n_colors, color_id), ~RColorBrewer::brewer.pal(n = ..2, name = ..1)[..3]))

Let's store the resulting data frame as "colors".

The full code:

patients %>%
    mutate(superactivity = ifelse(handling %in% c("Registration","Check-out", "Discuss Results"), "cat 1","cat 2")) %>%
    group_by(superactivity) %>%
    activities() %>%
    mutate(fill_scale = ifelse(superactivity == "cat 1","Reds","Blues")) %>%
    mutate(n_colors = n(), color_id = 1:n()) %>%
    mutate(color = pmap_chr(list(fill_scale, n_colors, color_id), ~RColorBrewer::brewer.pal(n = ..2, name = ..1)[..3])) -> colors

The output looks likes this:

  superactivity handling              absolute_frequency relative_frequency fill_scale n_colors color_id color  
  <chr>         <fct>                              <int>              <dbl> <chr>         <int>    <int> <chr>  
1 cat 1         Registration                         500              0.336 Reds              3        1 #FEE0D2
2 cat 1         Discuss Results                      495              0.333 Reds              3        2 #FC9272
3 cat 1         Check-out                            492              0.331 Reds              3        3 #DE2D26
4 cat 2         Triage and Assessment                500              0.405 Blues             4        1 #EFF3FF
5 cat 2         X-Ray                                261              0.212 Blues             4        2 #BDD7E7
6 cat 2         Blood test                           237              0.192 Blues             4        3 #6BAED6
7 cat 2         MRI SCAN                             236              0.191 Blues             4        4 #2171B5

Based on this, we can create the scalevector we need for the ggplot function as follows (handling here is the actual name of the activity classifier in your case).

color_scale <- colors$color
names(color_scale) <- colors$handling

Which you can input in the scale_fill_manual:

patients %>%
    trace_explorer(n_traces = 7) +
    scale_fill_manual(values = color_scale)

Result:

image

Of course it will need some tweaking to find readable and nice color, but hopefully this is something to start from.