franknarf1 / r-tutorial

This book covers the essentials of using R
Creative Commons Zero v1.0 Universal
12 stars 4 forks source link

add notes re optimized code #31

Open franknarf1 opened 6 years ago

franknarf1 commented 6 years ago

Maybe as "optimized calls" in the data.table section or "benchmarking" in the misc section.

Sorting. An example from today:

library(data.table)
n = 3e7
nv = 1e7
DT = data.table(dt = Sys.time() + sample(nv, n, replace=TRUE))[, c("d", "t") := .(as.IDate(dt), as.ITime(dt))][]

setindex(DT, dt)
setindex(DT, d, t)

system.time(DT[order(dt)]) # 4.8 s
system.time(DT[order(d, t)]) # 2.9 s

My takeaway is that sorting on ints is faster. Not actually sure if the indices are helping, since they are not acknowledged in the verbose output. The results above might be skewed by my comp currently being at 99 % RAM usage...

This is part of the unique(DT[order(ovars)], by=byvars, fromLast = TRUE) idiom that has come up on SO several times. I also tried DT[order(ovars), .SD[.N], by=byvars] and found the run time similarly too long. Of course, something like which.max should be faster to find the last entry, but I'm not sure if that's optimized yet, and besides it does not extend to multiple ovars and might not work for eg characters (since I recall that gmax does not)...

franknarf1 commented 6 years ago