Closed Anirban166 closed 4 years ago
Anirban would it be possible to use GuessCompx::CompEst inside our asymptoticComplextityClass function? e.g.
asymptoticComplexityClass <- function(f, ...){
GuessCompx::compEst(f, ...)
}
is there any reason that we need to copy/re-write that logic ourselves, or can we just use the functionality that already exists in GuessCompx?
Anirban would it be possible to use GuessCompx::CompEst inside our asymptoticComplextityClass function? e.g.
asymptoticComplexityClass <- function(f, ...){ GuessCompx::compEst(f, ...) }
is there any reason that we need to copy/re-write that logic ourselves, or can we just use the functionality that already exists in GuessCompx?
It would be possible but again yes, there are a few reasons why am not thinking to use that:
(1) CompEst
essentially takes a data frame given by the user as input (for the function to operate on), it won't be the same as passed by our timings function. And if we don't require the timings function (will need to optimize it, maybe by adding time limits as you mentioned early on) then it will essentially be the same as GuessCompx.
(2) CompEst
classifies time and memory together, which am thinking to do in two different functions.
(3) Memory complexity classification only works for Windows users in GuessCompx. (I will try to bypass that by collecting memory usage data from bench
)
For the third point, the problem comes from the fact that memory.size()
only works for windows. They tried to overcome that limitation but it wasn't successful - quoting from the research paper:
Bypassing the limitations of the memory.size()function to only Windows users has been unsuccessfully tried (pryr::mem_change, Rprof)
I would like to see if we can overcome that restriction, for our package to classify memory complexity with cross-platform (OS-wise) compatibility.
If it doesn't work out well, we can revert back to CompEst
.
@tdhock @neerajdhanraj
A take on the time complexity classification function:
It uses generalized linear models with formulae based on the size variations for our Timings (first column of the data frame returned from
asymptoticTimings
) with respect to the Data set sizes (second column) for different complexity classes, then applies cross validation on each of them (via alapply
on a list of those models) by thecv.glm
function and thereby extracts the adjusted delta errors (2) for each complexity class. The best fit would be the one which has the least error along a complexity class, and hence that is printed as the complexity in accordance.It serves our purpose (classifies correctly, for the tests done so far) and I was able to use the log-linear formula inside the glm scope without any error as well, with a correct classification of
PeakSegOptimal::PeakSegPDPA
as log-linear. (It was treated separately from the other classes throughout GuessCompx code because it had a transformation involving thesize
variable twice in the formula(size)log(size)
, and hence additionally this had to be considered. The instances ofNlogN
throughout the entire package are for this reason)For reference, lines 208 to 263 in my commented version of GuessCompx (wherein I included all the functions together - subsequently breaking them down when called in
CompEst()
, but followed along the memory complexity testing part) includes the part from which I derived my complexity classification function.The same approach could be followed for
asymptoticMemoryComplexityClass
, but we need to obtain the data frame consisting of memory use metrics and dataset sizes for its parameters first, for which we could usebench::bench_memory
(memory allocations are parsed byprofmem::readRprofmem()
) in ourasymptoticMemoryUsage
function.Also, the use of
bench
could potentially overcome the drawback of the OS-limitation of GuessCompx to Windows only, since the package extracts memory allocation with different APIs for both MacOS and Linux apart from Windows as well.