Closed jangorecki closed 3 years ago
Timings on https://github.com/Rdatatable/data.table/pull/4851 na.rm=TRUE 48.6s down to 14.3 na.rm=FALSE 15.5 down to 14.7
> system.time(nat <- DT[, .(v3=mean(v3, na.rm=TRUE)), by=id3, verbose=TRUE])
Detected that j uses these columns: v3
Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
5.198s elapsed (00:01:35 cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.091s elapsed (0.075s cpu)
Getting back original order ... forder.c received a vector type 'integer' length 10000000
0.479s elapsed (2.959s cpu)
lapply optimization is on, j unchanged as 'list(mean(v3, na.rm = TRUE))'
GForce optimized j to 'list(gmean(v3, na.rm = TRUE))'
Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.321
gforce assign high and low took 4.868
This gmean took (narm=TRUE) ... gather took ... 2.068s
2.298s
gforce eval took 2.300
8.537s elapsed (00:02:43 cpu)
user system elapsed
262.668 63.634 14.322
## drop caches in another session
> system.time(naf <- DT[, .(v3=mean(v3)), by=id3, verbose=TRUE])
Detected that j uses these columns: v3
Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
6.565s elapsed (00:01:35 cpu)
Finding group sizes from the positions (can be avoided to save RAM) ... 0.093s e
lapsed (0.085s cpu)
Getting back original order ... forder.c received a vector type 'integer' length
10000000
0.601s elapsed (5.789s cpu)
lapply optimization is on, j unchanged as 'list(mean(v3))'
GForce optimized j to 'list(gmean(v3))'
Making each group and running j (GForce TRUE) ... gforce initial population of g
rp took 0.314
gforce assign high and low took 4.880
This gmean took (narm=FALSE) ... gather took ... 1.717s
1.931s
gforce eval took 1.931
7.467s elapsed (00:02:39 cpu)
user system elapsed
261.039 61.257 14.738
Timings below may look like obtained from single session but they were actually run in fresh session each, also in between there was
sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'
.This is actually mentioned in #3202.