analyticalmonk / Rperform

:bar_chart: R package for tracking performance metrics across git versions and branches.
https://analyticalmonk.github.io/Rperform
GNU General Public License v3.0
65 stars 9 forks source link

Implement visualization functions for comparison of metrics across two branches #17

Closed analyticalmonk closed 8 years ago

analyticalmonk commented 8 years ago

These functions will be contained in the file, plot_metrics.R and will derive from the branch comparison metric functions implemented in branch_metrics.R.

Given below is an example visualization for comparison of runtime across two branches: plot_metrics(test_path = "./tests/testthat/test-extract.r", metric = "time", branch1 = 'rperform_test', branch2 = 'master')

plot_btimes

The vertical line divides the commits from _rperformtest and master branches. The function assumes that branch1 is to be merged into branch2 eventually.

Background: I created a branch 'rperform_test' in the stringr repo and subsequently, 4 commits. In each commit, I made a change in the code of the function relevant to the test file, text-extract.r. In the first commit, Sys.sleep(0.1) was added and Sys.sleep(0.5) in the second one. In the third commit, file was returned back to its original state and then, Sys.sleep(2) in the last one.

@tdhock @joshuaulrich Any suggestions regarding the visualization? Currently working on memory visualization.

Note 1: Ignore all the greenery prevalent in the plot for now, was playing around with colors! Note 2: I discovered a bug today which had been sitting in the code ever since last year. It's been addressed. More about it in the commit's message

analyticalmonk commented 8 years ago

an example visualization for comparison of memory usage across two branches:

plot_metrics(test_path = "./tests/testthat/test-join.r", metric = "memory", branch1 = 'rperform_test', branch2 = 'master')

test_test-join_rperform_test_mast

Background: In the last commit of the branch _rperformtest, added the following lines of code to the function relevant to test-join.r:

  m.size <- 1024L
  m <- matrix(5, m.size, m.size)

_Note: Need to look into the negative value being returned as leakmb.

tdhock commented 8 years ago

These graphics look good. I would add a geom_text with labels for the branch names (next to the black vline).

The negative value for the memory means that there is actually less memory used at the end of the test than at the start https://github.com/tdhock/testthatQuantity/blob/master/R/rss.profile.R#L28

analyticalmonk commented 8 years ago

Adding the labels is a good idea, will do that.

I do realize that the negative value signifies less memory being used than at the start. But since this was the first time a negative result had been obtained, I just wanted to double-check if it wasn't because of R doing something internally. Turns out, it happens so every time for this test. So the result is legit it seems.

And I was thinking about renaming the swap_mb variable. Since it has nothing to do with the swap memory concept, and in fact, it's returning the maximum unswapped memory being used, the name might lead to confusion.

tdhock commented 8 years ago

yes please rename the variables to clarify what they mean

analyticalmonk commented 8 years ago

I tried to label the plot using ggplot's geom_text() itself. The result is not very satisfactory.

rperform_branch_annotated

This is the code which produced the above plot.

ggplot2::ggplot(data = time_data, mapping = ggplot2::aes(message, metric_val)) +
ggplot2::geom_point(color = "blue") +
ggplot2::facet_grid(test_name ~ ., scales = "free") +
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = -90)) +
ggplot2::geom_vline(mapping = ggplot2::aes(xintercept = same_commit$cnum_b2 + 0.5)) +
ggplot2::geom_text(mapping = ggplot2::aes(x = same_commit$cnum_b2 + 0.3,
                                label = branch2, angle = 90,
                                vjust = "center"), 
                                check_overlap = TRUE) +
 ggplot2::geom_text(mapping = ggplot2::aes(x = same_commit$cnum_b2 + 0.7,
                                label = branch1, angle = -90,
                                 vjust = "center"), 
                                check_overlap = TRUE) 

We can't provide a default y value to _geomtext() since the scales would differ for every plot. I am not providing a y value so it aims to plot, by default, a text label at every y value it inherits from the aesthetics already present in the plot. That makes for a lot of text overlap and a messy plot but setting 'check_overlap' to TRUE takes care of that. However, more than one text label do make it to the final plot neverthless. Any tips about how to take care of that?

tdhock commented 8 years ago

use a separate data set for the geom_text, compute the min/max for each facet and use that for the position of each text label.

analyticalmonk commented 8 years ago

Thanks for the tip. It worked! Here is an example plot,

rperform_branch_annotated

Memory comparison:

Rperform::plot_branchmetrics(test_path = "tests/testthat/test-interp.r", metric = "memory", branch1 = "rperform_test", save_plots = F)

rperform_branch_annotatemem