Creating the complexity classification functions

Anirban166 commented 4 years ago

@tdhock @neerajdhanraj

A take on the time complexity classification function:

asymptoticTimeComplexityClass = function(model.df)
{
  constant   <- glm(Timings~1,                                      data = model.df); model.df['constant'] = fitted(constant)
  linear     <- glm(Timings~`Data set sizes`,                       data = model.df); model.df['linear'] = fitted(linear)
  squareroot <- glm(Timings~sqrt(`Data set sizes`),                 data = model.df); model.df['squareroot'] = fitted(squareroot)
  log        <- glm(Timings~log(`Data set sizes`),                  data = model.df); model.df['log'] = fitted(log)
  log.linear <- glm(Timings~`Data set sizes`*log(`Data set sizes`), data = model.df); model.df['log-linear'] = fitted(log.linear)
  quadratic  <- glm(Timings~I(`Data set sizes`^2),                  data = model.df); model.df['quadratic'] = fitted(quadratic)
  cubic      <- glm(Timings~I(`Data set sizes`^3),                  data = model.df); model.df['cubic'] = fitted(cubic)

  model.list <- list('constant'   = constant,
                     'linear'     = linear,
                     'squareroot' = squareroot,
                     'log'        = log,
                     'log-linear' = log.linear,
                     'quadratic'  = quadratic,
                     'cubic'      = cubic
                    )

  cross.validated.errors <- lapply(model.list, function(x) cv.glm(model.df, x)$delta[2])
  best.model <- names(which.min(cross.validated.errors))
  print(best.model)
}

It uses generalized linear models with formulae based on the size variations for our Timings (first column of the data frame returned from asymptoticTimings) with respect to the Data set sizes (second column) for different complexity classes, then applies cross validation on each of them (via a lapply on a list of those models) by the cv.glm function and thereby extracts the adjusted delta errors (2) for each complexity class. The best fit would be the one which has the least error along a complexity class, and hence that is printed as the complexity in accordance.

It serves our purpose (classifies correctly, for the tests done so far) and I was able to use the log-linear formula inside the glm scope without any error as well, with a correct classification of PeakSegOptimal::PeakSegPDPA as log-linear. (It was treated separately from the other classes throughout GuessCompx code because it had a transformation involving the size variable twice in the formula (size)log(size), and hence additionally this had to be considered. The instances of NlogN throughout the entire package are for this reason)

For reference, lines 208 to 263 in my commented version of GuessCompx (wherein I included all the functions together - subsequently breaking them down when called in CompEst(), but followed along the memory complexity testing part) includes the part from which I derived my complexity classification function.

The same approach could be followed for asymptoticMemoryComplexityClass, but we need to obtain the data frame consisting of memory use metrics and dataset sizes for its parameters first, for which we could use bench::bench_memory (memory allocations are parsed by profmem::readRprofmem()) in our asymptoticMemoryUsage function.

Also, the use of bench could potentially overcome the drawback of the OS-limitation of GuessCompx to Windows only, since the package extracts memory allocation with different APIs for both MacOS and Linux apart from Windows as well.

tdhock commented 4 years ago

Anirban would it be possible to use GuessCompx::CompEst inside our asymptoticComplextityClass function? e.g.

asymptoticComplexityClass <- function(f, ...){
  GuessCompx::compEst(f, ...)
}

is there any reason that we need to copy/re-write that logic ourselves, or can we just use the functionality that already exists in GuessCompx?

Anirban166 commented 4 years ago

Anirban would it be possible to use GuessCompx::CompEst inside our asymptoticComplextityClass function? e.g.
asymptoticComplexityClass <- function(f, ...){
  GuessCompx::compEst(f, ...)
}
is there any reason that we need to copy/re-write that logic ourselves, or can we just use the functionality that already exists in GuessCompx?

It would be possible but again yes, there are a few reasons why am not thinking to use that: (1) CompEst essentially takes a data frame given by the user as input (for the function to operate on), it won't be the same as passed by our timings function. And if we don't require the timings function (will need to optimize it, maybe by adding time limits as you mentioned early on) then it will essentially be the same as GuessCompx. (2) CompEst classifies time and memory together, which am thinking to do in two different functions. (3) Memory complexity classification only works for Windows users in GuessCompx. (I will try to bypass that by collecting memory usage data from bench)

Anirban166 commented 4 years ago

For the third point, the problem comes from the fact that memory.size() only works for windows. They tried to overcome that limitation but it wasn't successful - quoting from the research paper:

Bypassing the limitations of the memory.size()function to only Windows users has been unsuccessfully tried (pryr::mem_change, Rprof)

I would like to see if we can overcome that restriction, for our package to classify memory complexity with cross-platform (OS-wise) compatibility.

If it doesn't work out well, we can revert back to CompEst.

Anirban166 / testComplexity

Creating the complexity classification functions #3