Anirban166 / testComplexity

Asymptotic complexity testing framework
https://anirban166.github.io/testComplexity/
Other
36 stars 2 forks source link

Evaluation of function parameter 'data.sizes' in a suitable environment #24

Closed Anirban166 closed 4 years ago

Anirban166 commented 4 years ago

This might not be an issue in general, but for creating test cases and using our functions within other functions (vital for testing other functions), this poses as a problem. As @gaborcsardi correctly mentioned, asymptoticTimings fails inside the scope of another function, reason being (as per my understanding) data.sizes not being accessible in that environment. Need to figure out a workaround to emplace a proper scope/environment to evaluate the data.sizes parameter inside asymptoticTimings and elsewhere.

library(microbenchmark)
asymptoticTimings <- function(e, data.sizes = 10^seq(1, 5, by = 0.5), max.seconds, env = parent.frame())
{
  ifelse(!all(!is.infinite(data.sizes) & !is.na(data.sizes) & !is.nan(data.sizes)), stop("data.sizes must not contain any NA/NaN/Infinite value."), return)
  lang.obj <- substitute(e)
  fun.obj  <- function(data.sizes)
  {
    eval(lang.obj, env)
  }
  time.limit <- if(missing(max.seconds)) 10^8 else max.seconds*10^9
  timings.list <- list()
  i <- 1
  while(i <= length(data.sizes))
  {
    benchmarked.timings <- as.data.frame(microbenchmark(fun.obj(data.sizes[i])))
    benchmarked.timings$data.size <- data.sizes[i]
    timings.list[[i]] <- data.frame(benchmarked.timings$time, benchmarked.timings$data.size)
    if(mean(benchmarked.timings$time) > time.limit) break
    i <- i + 1
  }
  resultant.df <- do.call(rbind, timings.list)
  colnames(resultant.df) <- c("Timings", "Data sizes")
  return(resultant.df)
}
asymptoticTimings(substring(paste(rep("A", data.sizes), collapse = ""), 1:data.sizes, 1:data.sizes), data.sizes = 10^seq(1, 5, by = 0.5), max.seconds = 1)
#> Error in paste(rep("A", data.sizes), collapse = ""): object 'data.sizes' not found

Created on 2020-07-12 by the reprex package (v0.3.0)

Another instance indicating the same error can be reproduced from my raw draft of the expect_linear_time function:

expect_linear_time = function(f = asymptoticTimings, ...)
{
  timings.df <- f(...)
  stopifnot(asymptoticTimeComplexityClass(timings.df) == "linear")
}

expect_linear_time(substring(paste(rep("A", data.sizes), collapse = ""), 1:data.sizes, 1:data.sizes), data.sizes = 10^seq(1, 5, by = 0.5), max.seconds = 1)
#> Error in paste(rep("A", data.sizes), collapse = ""): object 'data.sizes' not found

Created on 2020-07-12 by the reprex package (v0.3.0)

Anirban166 commented 4 years ago

I think I can use the scoping operator to provide superassignment of data.sizes or accessibility of the same to parent environment of the outer function, but will need to emplace it properly.

Anirban166 commented 4 years ago

Default function specification would fail since it internally searches for data.sizes in another environment, but specifying asymptoticTimings directly during the function call to expect_linear_time would work:

library(boot)
library(microbenchmark)
asymptoticTimings <- function(e, data.sizes, max.seconds)
{
  ifelse(!all(!is.infinite(data.sizes) & !is.na(data.sizes) & !is.nan(data.sizes)), stop("data.sizes must not contain any NA/NaN/Infinite value."), return)
  lang.obj <- substitute(e)
  fun.obj  <- function(data.sizes)
  {
    eval(lang.obj)
  }
  time.limit <- if(missing(max.seconds)) 10^8 else max.seconds*10^9
  timings.list <- list()
  i <- 1
  while(i <= length(data.sizes))
  {
    benchmarked.timings <- as.data.frame(microbenchmark(fun.obj(data.sizes[i])))
    benchmarked.timings$data.size <- data.sizes[i]
    timings.list[[i]] <- data.frame(benchmarked.timings$time, benchmarked.timings$data.size)
    if(mean(benchmarked.timings$time) > time.limit) break
    i <- i + 1
  }
  resultant.df <- do.call(rbind, timings.list)
  colnames(resultant.df) <- c("Timings", "Data sizes")
  return(resultant.df)
}
asymptoticTimeComplexityClass = function(model.df)
{
  if(class(model.df) == "data.frame" & "Timings" %in% colnames(model.df) & "Data sizes" %in% colnames(model.df))
  {
    constant   <- glm(Timings~1,                              data = model.df); model.df['constant'] = fitted(constant)
    linear     <- glm(Timings~`Data sizes`,                   data = model.df); model.df['linear'] = fitted(linear)
    log        <- glm(Timings~log(`Data sizes`),              data = model.df); model.df['log'] = fitted(log)
    loglinear  <- glm(Timings~`Data sizes`*log(`Data sizes`), data = model.df); model.df['loglinear'] = fitted(loglinear)
    quadratic  <- glm(Timings~I(`Data sizes`^2),              data = model.df); model.df['quadratic'] = fitted(quadratic)
    model.list <- list()
    for(complexity.class in c('constant', 'log', 'linear', 'loglinear', 'quadratic'))
    {
      model.list[[complexity.class]] = eval(as.name(complexity.class))
    }
    cross.validated.errors <- lapply(model.list, function(x) cv.glm(model.df, x)$delta[2])
    best.model <- names(which.min(cross.validated.errors))
    print(best.model)
  }
  else stop("Input parameter must be a data frame with columns 'Timings' and 'Data sizes'")
}

expect_linear_time = function(f = asymptoticTimings, ...)
{
  timings.df <- f(...)

  stopifnot(asymptoticTimeComplexityClass(timings.df) == "linear")
}
expect_linear_time(asymptoticTimings, substring(paste(rep("A", data.sizes), collapse = ""), 1:data.sizes, 1:data.sizes), data.sizes = 10^seq(1, 5, by = 0.5), max.seconds = 1)
#> [1] "linear"

Created on 2020-07-13 by the reprex package (v0.3.0)