Anirban166 / testComplexity

Asymptotic complexity testing framework
https://anirban166.github.io/testComplexity/
Other
36 stars 2 forks source link

Add features for plotting functionality #15

Closed Anirban166 closed 4 years ago

Anirban166 commented 4 years ago

Current plots look like: image which looks fine, but thinking to add more customization.


Branch : Plotfunc Objectives : Add more features to plotting functions in testComplexity.

Anirban166 commented 4 years ago

Using ggfortify, an extension of ggplot2, we can produce diagnostic plots for our generalized linear models, with the same formulae we used in the complexity classification functions. Example: (Quadratic case of PeakSegDP::cDPA) image image

Anirban166 commented 4 years ago

One comparison plot example, considering functions with different complexities: comparisonplotcomplexities Note that df.one has more observations (900), because it runs for higher data sizes as the default max.limit hasn't been breached, with it being of linear complexity, (timings are smaller in size) as compared to log-linear and quadratic complexities from df.two and df.three respectively, which have only 600 obs. each since the time limit had been exceeded and further computation on larger data sizes are avoided (which shows max.seconds is working as expected) image Removed 300 columns from df.one in order to get an equal number of rows for each, added an expr column to each specifying their function names (Substring, PeakSegPDPA, cDPA) to help distinguish by aesthetics based on it, then combined the three data frames using rbind() and finally, plotted the same using suitable aesthetics, geometry, scale and labels/titles using ggplot: plotcomparisontimings Timingscomparisonplot

Anirban166 commented 4 years ago

Comparison plot including more functions: 6

Anirban166 commented 4 years ago

Code for a comparison plot of 8 functions:

# Compute timings for required functions from asymptoticTimings and collect in subsequent data frames:
df.one <- asymptoticTimings(substring(paste(rep("A", data.sizes), collapse = ""), 1:data.sizes, 1:data.sizes), data.sizes = 10^seq(1, 4, by = 0.5))
df.two <- asymptoticTimings(PeakSegOptimal::PeakSegPDPA(rpois(data.sizes, 1),rep(1, length(rpois(data.sizes, 1))), 3L), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 1)
df.three <- asymptoticTimings(PeakSegDP::cDPA(rpois(data.sizes, 1), rep(1, length(rpois(data.sizes, 1))), 3L), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 5)
df.four <- asymptoticTimings(gregexpr("a", paste(collapse = "", rep("ab", data.sizes)), perl = TRUE), data.sizes = 10^seq(1, 4, by = 0.5))
df.five <- asymptoticTimings(fpop::Fpop(rnorm(data.sizes), 1), data.sizes = 10^seq(1, 4, by = 0.5))
df.six <- asymptoticTimings(opart::opart_gaussian(rnorm(data.sizes), 1), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 1)
df.seven <- asymptoticTimings(gfpop(data = dataGenerator(as.integer(data.sizes), c(0.1, 0.2, 0.3, 0.4, 0.6, 0.8, 1), c(0, 0.5, 1, 1.5, 2, 2.5, 3), sigma = 1), mygraph = graph(penalty = 2*log(as.integer(data.sizes)), type = "isotonic"), type = "mean"), data.sizes = 10^seq(1, 4, by = 0.5), max.seconds = 1)
df.eight <- asymptoticTimings(PeakSegDisk::PeakSegFPOP_vec(1:data.sizes, 10), data.sizes = 10^seq(1,4,by=0.5), max.seconds = 2.5)
# Assign a third column to each, in order to help distinguish them while plotting:
df.one$expr = "substring"
df.two$expr = "PeakSegPDPA"
df.three$expr = "cDPA"
df.four$expr = "gregexpr"
df.five$expr = "fpop"
df.six$expr = "opart"
df.seven$expr = "gfpop"
df.eight$expr = "PeakSegFPOP_vec"
# Combine the data frames using an rbind() and plot using ggplot with suitable parameters:
plot.df <- rbind(df.one, df.two, df.three, df.four, df.five, df.six, df.seven, df.eight)
ggplot(plot.df, aes(x = `Data sizes`,y = Timings)) + geom_point(aes(color = expr)) + geom_line(aes(color = expr)) + labs(x="Data sizes", y="Runtime (in nanoseconds)") + scale_x_log10() + scale_y_log10() + ggtitle("Timings comparison plot", subtitle = "Linear vs Log-linear vs Quadratic complexities")

8

Anirban166 commented 4 years ago

Closed until further features are thought-of to be added.

Anirban166 commented 4 years ago

Few more customizations:

The use of labels at the last polygons would be better, but then sometimes then tend to be not as well, eg:
image

Perhaps it can be adjusted, but regular labelling on the side looks good as well for now so dunno whether to include it or not

Anirban166 commented 4 years ago

Could do with the addition of a theme or add one to imports, plus move the optional parameters to an ellipsis