Closed katrinabrock closed 2 months ago
Hi @katrinabrock , I just pushed the discussed code that sorts the time lags once, and then only calculates the requisite matrices once per unique time lag. This should be much faster than memoise
, because no check is made per call and the sorting is done outside of the optimization loop.
I will be in China for ~2 weeks before I can get to anything else.
I used the atime package to test pre and post your most recent commit as well as with and without memoise. This is taking different size subsets of the buffalo$Cilla dataset. Conclusion is memoise and your speedup have a similar modest gain in time for datasets the size of that full one (and presumably larger). My memoise version has a substantial memory cost, your speedup has a smaller but notable memory improvement.
Code to generate this plot:
N_length <- 8
times <- 8
seconds.limit <- 600
print(glue::glue('Max run time is {N_length*times*seconds.limit/3600} hours'))
N_min <- 100
N_max <- 3527
tdir <- tempfile()
dir.create(tdir)
git2r::clone("https://github.com/katrinabrock/ctmm", tdir)
atime.list <- atime::atime_versions(
pkg.path=tdir,
seconds.limit = seconds.limit,
times=times,
N=round(exp(seq(log(N_min), log(N_max), length.out=N_length))),
setup={
data("buffalo")
Cilla <- buffalo$Cilla[1:N, ]
m.iid <- ctmm::ctmm(sigma=23 %#% "km^2")
m.ou <- ctmm::ctmm(sigma=23 %#% "km^2",tau=6 %#% "day")
m.ouf <- ctmm::ctmm(sigma=23 %#% "km^2",tau=c(6 %#% "day",1 %#% "hour"))
},
expr={
M.IID <- ctmm::ctmm.fit(Cilla,m.iid)
M.OU <- ctmm::ctmm.fit(Cilla,m.ou)
M.OUF <- ctmm::ctmm.fit(Cilla,m.ouf)
},
pre_speedup = "66d1f5180323ae5502417c908e4e686ba65d2334",
post_speedup="37a1aa480ab326f524e4314731cbfb2107182180",
memoise_no_speedup="cd390665b6076b113d27abad06f1f778f3bc66fe",
memoise_with_speedup="a50c47178037483c76f521b60329395d1c538e00")
saveRDS(atime.list, './data/20240718_atime.rds')
# ran up to here as a script
# later ran this
atime.list <- readRDS('./data/20240718_atime.rds')
plot(atime.list)
oh...one other thing....it looks both time and memory are increasing linearly on a log scale with data size ...so...exponential growth. Is that expected?
EDIT:walking home I realized both axises are log so it's linear which I think is what you said is expected.
Using memoise function to avoid duplicate calls to
langevin
function by saving the results in memory and reusing them. I validated the speedup by running a few ctmm.fit from the vignettes. I ran the same code 10x with the mainline code and with this version and saved the results of Rprof.Here are the results. Total time is in seconds of cpu time. lengevin_percent is the percent of cpu time spend within the langevin function or functions it calls. Gains are modest in terms of total time(2% lower mean, 5% lower median), but look real.
This change might be unnecessary if you're removing all unnecessary calls to
langevin
in a different way.Here's the CSV of the data from above charts: 20240701_memoise_bench.csv
Here's the code that generated the data: