Closed Anirban166 closed 4 years ago
Using ggfortify
, an extension of ggplot2
, we can produce diagnostic plots for our generalized linear models, with the same formulae we used in the complexity classification functions. Example: (Quadratic case of PeakSegDP::cDPA
)
One comparison plot example, considering functions with different complexities:
Note that df.one
has more observations (900), because it runs for higher data sizes as the default max.limit
hasn't been breached, with it being of linear complexity, (timings are smaller in size) as compared to log-linear and quadratic complexities from df.two
and df.three
respectively, which have only 600 obs. each since the time limit had been exceeded and further computation on larger data sizes are avoided (which shows max.seconds
is working as expected)
Removed 300 columns from df.one
in order to get an equal number of rows for each, added an expr
column to each specifying their function names (Substring, PeakSegPDPA, cDPA) to help distinguish by aesthetics based on it, then combined the three data frames using rbind()
and finally, plotted the same using suitable aesthetics, geometry, scale and labels/titles using ggplot
:
Comparison plot including more functions:
Code for a comparison plot of 8 functions:
# Compute timings for required functions from asymptoticTimings and collect in subsequent data frames:
df.one <- asymptoticTimings(substring(paste(rep("A", data.sizes), collapse = ""), 1:data.sizes, 1:data.sizes), data.sizes = 10^seq(1, 4, by = 0.5))
df.two <- asymptoticTimings(PeakSegOptimal::PeakSegPDPA(rpois(data.sizes, 1),rep(1, length(rpois(data.sizes, 1))), 3L), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 1)
df.three <- asymptoticTimings(PeakSegDP::cDPA(rpois(data.sizes, 1), rep(1, length(rpois(data.sizes, 1))), 3L), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 5)
df.four <- asymptoticTimings(gregexpr("a", paste(collapse = "", rep("ab", data.sizes)), perl = TRUE), data.sizes = 10^seq(1, 4, by = 0.5))
df.five <- asymptoticTimings(fpop::Fpop(rnorm(data.sizes), 1), data.sizes = 10^seq(1, 4, by = 0.5))
df.six <- asymptoticTimings(opart::opart_gaussian(rnorm(data.sizes), 1), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 1)
df.seven <- asymptoticTimings(gfpop(data = dataGenerator(as.integer(data.sizes), c(0.1, 0.2, 0.3, 0.4, 0.6, 0.8, 1), c(0, 0.5, 1, 1.5, 2, 2.5, 3), sigma = 1), mygraph = graph(penalty = 2*log(as.integer(data.sizes)), type = "isotonic"), type = "mean"), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 1)
df.eight <- asymptoticTimings(PeakSegDisk::PeakSegFPOP_vec(1:data.sizes, 10), data.sizes = 10^seq(1,4,by=0.5), max.seconds = 2.5)
# Assign a third column to each, in order to help distinguish them while plotting:
df.one$expr = "substring"
df.two$expr = "PeakSegPDPA"
df.three$expr = "cDPA"
df.four$expr = "gregexpr"
df.five$expr = "fpop"
df.six$expr = "opart"
df.seven$expr = "gfpop"
df.eight$expr = "PeakSegFPOP_vec"
# Combine the data frames using an rbind() and plot using ggplot with suitable parameters:
plot.df <- rbind(df.one, df.two, df.three, df.four, df.five, df.six, df.seven, df.eight)
ggplot(plot.df, aes(x = `Data sizes`,y = Timings)) + geom_point(aes(color = expr)) + geom_line(aes(color = expr)) + labs(x="Data sizes", y="Runtime (in nanoseconds)") + scale_x_log10() + scale_y_log10() + ggtitle("Timings comparison plot", subtitle = "Linear vs Log-linear vs Quadratic complexities")
Closed until further features are thought-of to be added.
Few more customizations:
directlabels
: The use of labels at the last polygons would be better, but then sometimes then tend to be not as well, eg:
Perhaps it can be adjusted, but regular labelling on the side looks good as well for now so dunno whether to include it or not
Custom themes: After going through a ton of themes, I found these to be good in their respective plot features:
Background colour of the 'FT' theme from hrbrthemes
, which can be use with the roboto-condensed font using the ft_rc
variant. Eg: (note that the fonts aren't appearing here for me on Windows, as described in their issue 28)
Plot grid of pandoc
from ggthemes
. Eg: (on 3100 obs. each, with data.sizes = 10^seq(1, 4, by = 0.1)
, just for fun)
Fonts of gdocs
from ggthemes
. Eg:
Could do with the addition of a theme or add one to imports, plus move the optional parameters to an ellipsis
Current plots look like: which looks fine, but thinking to add more customization.
Branch : Plotfunc Objectives : Add more features to plotting functions in testComplexity.