Closed Anirban166 closed 4 years ago
@tdhock Also here, is the reason for *tmp*
allocation against my expected data frame because of the function scope, where the values in timings.list
are not accessible outside the scope of lapply
?
okay I got it to work, seems I was doing the wrong thing when I could just collect it directly into timings.list
with the lapply
@tdhock Is this any better than what we had?
asymptoticTimings <- function(e, data.sizes, max.seconds)
{
if(!all(!is.infinite(data.sizes) & !is.na(data.sizes) & !is.nan(data.sizes)))
{
stop("data.sizes must not contain any NA/NaN/Infinite value.")
}
lang.obj <- substitute(e)
fun.obj <- function(data.sizes)
{
eval(lang.obj)
}
time.limit = ifelse(missing(max.seconds), 10^8, max.seconds*10^9)
break.bool <- TRUE
timings.list <- list()
timings.list <- lapply(seq(along = data.sizes), function(i)
{
if(break.bool)
{ benchmarked.timings <- as.data.frame(microbenchmark(fun.obj(data.sizes[i])))
if(mean(benchmarked.timings$time) > time.limit)
break.bool <<- FALSE
benchmarked.timings$data.size <- data.sizes[i]
return(data.frame(benchmarked.timings$time, benchmarked.timings$data.size))
}
})
resultant.df <- do.call(rbind, timings.list)
colnames(resultant.df) <- c("Timings", "Data sizes")
return(resultant.df)
}
Seems to be working fine, but dunno if it improved the speed much (noticed no difference)
Could also use a pipe here
library(magrittr)
asymptoticTimings <- function(e, data.sizes, max.seconds)
{
if(!all(!is.infinite(data.sizes) & !is.na(data.sizes) & !is.nan(data.sizes)))
{
stop("data.sizes must not contain any NA/NaN/Infinite value.")
}
lang.obj <- substitute(e)
fun.obj <- function(data.sizes)
{
eval(lang.obj)
}
time.limit = ifelse(missing(max.seconds), 10^8, max.seconds*10^9)
break.bool <- TRUE
timings.list <- list()
timings.list <- seq(along = data.sizes) %>% lapply(function(i)
{
if(break.bool)
{ benchmarked.timings <- as.data.frame(microbenchmark(fun.obj(data.sizes[i])))
if(mean(benchmarked.timings$time) > time.limit)
break.bool <<- FALSE
benchmarked.timings$data.size <- data.sizes[i]
return(data.frame(benchmarked.timings$time, benchmarked.timings$data.size))
}
})
resultant.df <- do.call(rbind, timings.list)
colnames(resultant.df) <- c("Timings", "Data sizes")
return(resultant.df)
}
hi
hi
- for loops used to be slower than lapply/etc in old versions of R but that is no longer the case. in current R for loop is just as fast as apply etc, so please use whichever is easier to understand.
ah okay, I'll stick to for-loops then (easier to understand)
- use seq_along(some.vector) instead of 1:length(some.vector)
yes I am using seq(along = data.sizes)
currently
- please avoid using pipes, makes it harder to debug
alright sure (+ I would need to import magrittr/dplyr additionally as well)
- use if(scalar.logical)something else something.else if you only need to test one scalar value, and use ifelse(vector.logical, something.vector, something.else) if you need to test a vector of values.
okay, so for instance such type of changes would look like:
ifelse(!all(!is.infinite(data.sizes) & !is.na(data.sizes) & !is.nan(data.sizes)), stop("data.sizes must not contain any NA/NaN/Infinite value."), return)
if(missing(max.seconds))
time.limit = 10^8
else time.limit = max.seconds*10^9
does that help?
It certainly does, I'll keep these points in mind while writing future code - thanks!
previous asymptoticTimings
:
asymptoticTimings <- function(e, data.sizes, max.seconds)
{
if(!all(!is.infinite(data.sizes) & !is.na(data.sizes) & !is.nan(data.sizes)))
{
stop("data.sizes must not contain any NA/NaN/Infinite value.")
}
if(length(data.sizes) == 0)
{
stop("Cannot run on an empty vector for 'data.sizes'.")
}
lang.obj <- substitute(e)
fun.obj <- function(data.sizes)
{
eval(lang.obj)
}
time.limit = ifelse(missing(max.seconds), 10^8, max.seconds*10^9)
l <- length(data.sizes)
timings.list <- list()
for(i in 1:l)
{
benchmarked.timings <- as.data.frame(microbenchmark(fun.obj(data.sizes[i])))
benchmarked.timings$data.size <- data.sizes[i]
timings.list[[i]] <- data.frame(benchmarked.timings$time, benchmarked.timings$data.size)
ifelse((mean(benchmarked.timings$time) > time.limit), break, next)
}
resultant.df <- do.call(rbind, timings.list)
colnames(resultant.df) <- c("Timings", "Data sizes")
return(resultant.df)
}
asymptoticTimings
after the changes discussed above:
asymptoticTimings <- function(e, data.sizes, max.seconds)
{
ifelse(!all(!is.infinite(data.sizes) & !is.na(data.sizes) & !is.nan(data.sizes)), stop("data.sizes must not contain any NA/NaN/Infinite value."), return)
lang.obj <- substitute(e)
fun.obj <- function(data.sizes)
{
eval(lang.obj)
}
if(missing(max.seconds))
time.limit = 10^8
else time.limit = max.seconds*10^9
timings.list <- list()
for(i in seq(along = data.sizes))
{
benchmarked.timings <- as.data.frame(microbenchmark(fun.obj(data.sizes[i])))
benchmarked.timings$data.size <- data.sizes[i]
timings.list[[i]] <- data.frame(benchmarked.timings$time, benchmarked.timings$data.size)
if(mean(benchmarked.timings$time) > time.limit)
break
else next
}
resultant.df <- do.call(rbind, timings.list)
colnames(resultant.df) <- c("Timings", "Data sizes")
return(resultant.df)
}
also for assigning variables you should use
time.limit <- if(missing(max.seconds)) 10^8 else max.seconds*10^9
also for assigning variables you should use
time.limit <- if(missing(max.seconds)) 10^8 else max.seconds*10^9
Yes that looks better, done
also usually with a for loop people would expect all of the iterations to be computed, but they are not because you are using break. Maybe that works but it is confusing. Would be easier to understood if you changed that to a while loop I think.
On Thu, Jul 9, 2020 at 10:32 PM Anirban notifications@github.com wrote:
Closed #22 https://github.com/Anirban166/testComplexity/issues/22.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Anirban166/testComplexity/issues/22#event-3532415112, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAHDX4QKIN6C7EHOTOE5IWDR22RYFANCNFSM4OQWW7GQ .
also usually with a for loop people would expect all of the iterations to be computed, but they are not because you are using break. Maybe that works but it is confusing. Would be easier to understood if you changed that to a while loop I think.
okay then it would look like this: (since we can't use seq(along = data.sizes)
or seq_along(data.sizes)
inside while)
asymptoticTimings <- function(e, data.sizes, max.seconds)
{
ifelse(!all(!is.infinite(data.sizes) & !is.na(data.sizes) & !is.nan(data.sizes)), stop("data.sizes must not contain any NA/NaN/Infinite value."), return)
lang.obj <- substitute(e)
fun.obj <- function(data.sizes)
{
eval(lang.obj)
}
time.limit <- if(missing(max.seconds)) 10^8 else max.seconds*10^9
timings.list <- list()
i <- 1
while(i <= length(data.sizes))
{
benchmarked.timings <- as.data.frame(microbenchmark(fun.obj(data.sizes[i])))
benchmarked.timings$data.size <- data.sizes[i]
timings.list[[i]] <- data.frame(benchmarked.timings$time, benchmarked.timings$data.size)
if(mean(benchmarked.timings$time) > time.limit) break
i <- i + 1
}
resultant.df <- do.call(rbind, timings.list)
colnames(resultant.df) <- c("Timings", "Data sizes")
return(resultant.df)
}
is this implementation ok?
I understand the need for change since for()
tends to be better only if repetition count is known and iterated over for all the values as you mentioned (whereas while tends to be more general and suitable for breaking out when required) but is the lapply version suitable in place of this one i.e. are while loops favourable or a function over a lapply better, since we are avoiding for-loops?
I've been thinking about making some minor optimizations to my code style for the quantifying functions, for instance using a sequence along our parameter
data.sizes
to discard the extraif
check for emptydata.sizes
:which is fine, but furthermore using some sort of an apply function is thought-of to be a better practice due to it being faster than a loop, for which I thought of using a
lapply
:Note that I had to discard the use of
break
since the chunk of code isn't in a loop anymore, for which I introduced a booleanbreak.bool
set to true initially outside the scope of thelapply
, and internally set to false using the scoping operator if the mean of benchmarked timings for an iteration exceeds the user-set/hardcoded time-limit.This practice led to an error:
So
*tmp*
was being assigned to ourdata.frame
composed of the combination of the list elements in ourtimings.list
, which am not sure why, but one potential reason could be for thelapply
returning a list, since the output of a list is always a list. So I tried unlisting thelapply
:But in still results in the same error: