Rdatatable / data.table

R's data.table package extends data.frame:
http://r-datatable.com
Mozilla Public License 2.0
3.62k stars 986 forks source link

gmean na.rm=TRUE is much slower than na.rm=FALSE #4849

Closed jangorecki closed 3 years ago

jangorecki commented 3 years ago

Timings below may look like obtained from single session but they were actually run in fresh session each, also in between there was sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'.

library(data.table) ## 1.13.5
setDTthreads(0L) ## 40
set.seed(108)
N = 1e9L
K = 1e2L
DT = list()
DT[["id3"]] = factor(sample(sprintf("id%010d",1:(N/K)), N, TRUE))
DT[["v3"]] =  round(runif(N,max=100),6)
setDT(DT)

system.time(naf <- DT[, .(v3=mean(v3)), by=id3, verbose=TRUE])
#Detected that j uses these columns: v3 
#Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
#5.615s elapsed (00:01:39 cpu) 
#Finding group sizes from the positions (can be avoided to save RAM) ... 0.091s elapsed (0.074s cpu) 
#Getting back original order ... forder.c received a vector type 'integer' length 10000000
#1.037s elapsed (2.888s cpu) 
#lapply optimization is on, j unchanged as 'list(mean(v3))'
#GForce optimized j to 'list(gmean(v3))'
#Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.319
#gforce assign high and low took 4.399
#This gsum took (narm=FALSE) ... gather took ... 2.107s
#2.322s
#gforce eval took 2.339
#8.738s elapsed (00:02:39 cpu) 
#
#   user  system elapsed 
#261.852  67.723  15.498 

system.time(nat <- DT[, .(v3=mean(v3, na.rm=TRUE)), by=id3, verbose=TRUE])
#Detected that j uses these columns: v3 
#Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
#5.799s elapsed (00:01:42 cpu) 
#Finding group sizes from the positions (can be avoided to save RAM) ... 0.090s elapsed (0.074s cpu) 
#Getting back original order ... forder.c received a vector type 'integer' length 10000000
#2.608s elapsed (3.275s cpu) 
#lapply optimization is on, j unchanged as 'list(mean(v3, na.rm = TRUE))'
#GForce optimized j to 'list(gmean(v3, na.rm = TRUE))'
#Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.346
#gforce assign high and low took 4.978
#gforce eval took 33.515
#40.2s elapsed (00:02:24 cpu) 
#
#   user  system elapsed 
#250.858  68.804  48.679

This is actually mentioned in #3202.

jangorecki commented 3 years ago

Timings on https://github.com/Rdatatable/data.table/pull/4851 na.rm=TRUE 48.6s down to 14.3 na.rm=FALSE 15.5 down to 14.7

> system.time(nat <- DT[, .(v3=mean(v3, na.rm=TRUE)), by=id3, verbose=TRUE])
Detected that j uses these columns: v3
Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
5.198s elapsed (00:01:35 cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.091s elapsed (0.075s cpu) 
Getting back original order ... forder.c received a vector type 'integer' length 10000000
0.479s elapsed (2.959s cpu) 
lapply optimization is on, j unchanged as 'list(mean(v3, na.rm = TRUE))'
GForce optimized j to 'list(gmean(v3, na.rm = TRUE))'
Making each group and running j (GForce TRUE) ... gforce initial population of grp took 0.321
gforce assign high and low took 4.868
This gmean took (narm=TRUE) ... gather took ... 2.068s
2.298s
gforce eval took 2.300
8.537s elapsed (00:02:43 cpu) 
   user  system elapsed 
262.668  63.634  14.322 

## drop caches in another session

> system.time(naf <- DT[, .(v3=mean(v3)), by=id3, verbose=TRUE])
Detected that j uses these columns: v3 
Finding groups using forderv ... forder.c received 1000000000 rows and 1 columns
6.565s elapsed (00:01:35 cpu) 
Finding group sizes from the positions (can be avoided to save RAM) ... 0.093s e
lapsed (0.085s cpu) 
Getting back original order ... forder.c received a vector type 'integer' length
 10000000
0.601s elapsed (5.789s cpu) 
lapply optimization is on, j unchanged as 'list(mean(v3))'
GForce optimized j to 'list(gmean(v3))'
Making each group and running j (GForce TRUE) ... gforce initial population of g
rp took 0.314
gforce assign high and low took 4.880
This gmean took (narm=FALSE) ... gather took ... 1.717s
1.931s
gforce eval took 1.931
7.467s elapsed (00:02:39 cpu) 
   user  system elapsed 
261.039  61.257  14.738