bupaverse / processmapR

Visualize event logs using directed graphs, i.e. process maps.
https://bupaverse.github.io/processmapR/
Other
9 stars 6 forks source link

Dotted chart: Show process interaction #20

Closed Someone894 closed 4 years ago

Someone894 commented 4 years ago

Hello,

currently I'm using bupaR to visualize the run times within a complex job-scheduling system. To do so I use the process map, this works grate and provides us with a nice tool to find bottlenecks.

Since most of the processes run on a single hardware, they seem to block each other without having any dependent steps: E.g. process A runs independent of process B, but process A occupies the CPU and therefore process B takes longer to run. To better understand these effects I'm looking for a way to plot both processes on a x (time), y (process steps of A and B) plot. This brought me to the Dotted-chart-plot, which comes quite close to what I'm looking for.

The Dotted-chart-plot only displays all events in the process as dots, but in my case I don't care about the individual events, I'm looking for a plot where every process is displayed as a line from it's start in time to its end. This way I can easily spot any co-occurring events.

Here is an example to better illustrate the Idea: This is the best I got: Capture

This is what I'm looking for: Capture - Copy

Is there a way to add a lines=TRUE option?

gertjanssenswillen commented 4 years ago

So, do I understand it correctly that what you want is to have horizontal lines for each case? The three colors in your graph, are they still cases (like in the normal dotted_chart)? Or different activities? (It seems like the coincide in your use case)

Someone894 commented 4 years ago

Yes you are spot on:

processmapR::dotted_chart(x = "absolute", sort = "start", units = "secs", color = "id")

gertjanssenswillen commented 4 years ago

I have just added a new function called lined_chart. It only has the color argument, it does not have the sort and x arguments (not sure whether they make sense).

Color by default is NULL (lines will be black). You can set it to any attribute, as long as it has a unique value for the case (since each line has to have a single color).

Still somewhat experimental, but let me know what you think.

Someone894 commented 4 years ago

I've tried the new lined_chart, without the color argument as you can see in the attached image.

line_chart

When I try to color the graph by the process ID i get an error message: Attribute given to color argument is not a case attribute When I use my empty resource column and copy the ID over (data$resource <- data$id) it works just fine, as you can see.

Rplot2


You managed to make the lined_chart function the way I drew the first image, thank you very much. But as it turned out me and my colleagues haven't thought exactly about what we're looking for.

At the moment the line is drawn from the start of the first activity to the end of the last activity, so breaks are shown as part of the process. Is it possible to add a switch, to show only lines between start and end of a single activity?

This image is an rough example, just to show the breaks within a single line: 77758258-e9b74800-7032-11ea-850b-206312940c85

I'm sorry for not knowing what I wanted in the first place.

gertjanssenswillen commented 4 years ago

With respect to case_id not accepted: I'll fix this. For some reason case id does not pass the test of being a case attribute, while it is per definition.

With respect to the modified version: do I derive right that you have start and complete events for activities, and you want to show a line between them?

Someone894 commented 4 years ago

Thank you very much for your work regarding the "case_id not accepted" issue.

Yeah exactly: Every activity has two stati (start and complete) and I'd hope to see a line between the start and completion of an activity, colored by its Id. This way you can easiliy see which activitiy from which process is blocking the hardware.

Thank you very much :-)

gertjanssenswillen commented 4 years ago

Ok, I have revised the lined chart so that it now draws lines for each activity instance from start to end. I think it is more useful in this way in general, especially for logs with longs activities and few cases.

A few examples. This is for 10 cases from the patients dataset. By default the lines are colours by activity type - just as with the dotted chart.

image

You can now change the color to the case_id without problem

image

And for completeness, you can now also set the x argument of the function, in the same way as you can for a normal dotted chart. For example, you can make the x-axis relative time, since the start of the case. Probably you don't need it in your context, but it might be useful and enhances the genericness and similarity with dotted chart.

image

You can configure the thickness of the lines with the line_width argument. At the moment it is defaulting to 2. I could figure out a good default right away / or an automatic scalin. The examples above are made with width 5. Feel free to share your experience, we can always update the default.

(nb: the relative_week and relative_day are not implemented so far, because it would require lines to be split up of the cross midnights or the end of the week)

gertjanssenswillen commented 4 years ago

Btw, if activity instances overlap in time within a single case, the segments are going to be plotted on top of each other at the moment. The easiest solution for that would be to make the lines slightly transparant.

Someone894 commented 4 years ago

I'm impressed how fast you are dealing with this issue, you are a real treasure :-)

While testing the new features I found an error. For you to reproduce it, I created a simplified test case, that is attached: test_data.zip


test %>%
  processmapR::lined_chart(
    color = "id",
    x = "absolute",
    line_width = 1
  )

results in a single line per ID, which is the same result as from my 22.04.2020 posting:

grafik

test %>%
  processmapR::lined_chart(
    color = "activity",
    x = "absolute",
    line_width = 1
  )

Which is what I'm looking for results in an error:

Joining, by = "id"
Error: Insufficient values in manual scale. 100 needed but only 74 provided.
Run `rlang::last_error()` to see where the error occurred.

Maybe the error is a bug or maybe I'm using the lined_chart wrongly.

I hope you can reproduce the error. Thank you very much :-)

gertjanssenswillen commented 4 years ago

I derive that you want to color by activity, but there are 100 different activities? The error is saying it hasn't enough values in the color scale. A solution would be to change the scale, but I'm not sure whether a color scale with 100 colors is going to be useful?

Someone894 commented 4 years ago

Thank you for your quick answer. Yeah, you are right it is not useful, my idea was to color based on the ID and to have the gaps based upon the activities.

One line (from the same ID) has one color and every activity in it is shown with a line from its start to end in the color of the process.

Thank you very much :-)