Open franknarf1 opened 6 years ago
Jim Hester's bench package (for time + mem): https://cran.r-project.org/web/packages/bench/index.html Example: https://stackoverflow.com/a/51675804/
Handling categorical data (or ordinal/cardinal data with limited/finite support, like dates) https://chat.stackoverflow.com/transcript/message/46494926#46494926 use factors, modify with by=, etc even if the function is vectorized
Maybe as "optimized calls" in the data.table section or "benchmarking" in the misc section.
Sorting. An example from today:
My takeaway is that sorting on ints is faster. Not actually sure if the indices are helping, since they are not acknowledged in the verbose output. The results above might be skewed by my comp currently being at 99 % RAM usage...
This is part of the
unique(DT[order(ovars)], by=byvars, fromLast = TRUE)
idiom that has come up on SO several times. I also triedDT[order(ovars), .SD[.N], by=byvars]
and found the run time similarly too long. Of course, something like which.max should be faster to find the last entry, but I'm not sure if that's optimized yet, and besides it does not extend to multiple ovars and might not work for eg characters (since I recall that gmax does not)...