RGLab / flowStats

flowStats: algorithms for flow cytometry data analysis using BioConductor tools
15 stars 10 forks source link

optimize warpSetNCDF #8

Closed mikejiang closed 7 years ago

mikejiang commented 11 years ago

There are lots of processes within this routine involve cdf reading:

Most of them are channel-wise operations. With the enhancement from ncdfFlow#13, we can potentially greatly speed normalization by passing a subsetted ncdfFlowSet object .

Also we want to change the ncdfFlow::writeSlice C API to only write normalized channels.

mikejiang commented 11 years ago

normalizing 2 channels for 12 samples:

chnls <- c("<Violet A 610/20-A>","<Blue F 525/50-A>")
system.time(warpSetNCDF(x = fs ,stains = chnls)))

 user  system elapsed 
 81.605   1.668  83.321 

The enhanced warpSetNCDF based on #14 and #13 takes

system.time(warpSetNCDF(x = fs[,chnls] ,stains = chnls)))
user  system elapsed 
 50.599   2.516  53.649

As sample size grows, we should see more significant gains.

raphg commented 11 years ago

Well done Mike!

On Tue, Jun 25, 2013 at 4:17 PM, Mike Jiang notifications@github.comwrote:

normalizing 2 channels for 12 samples:

chnls <- c("<Violet A 610/20-A>","<Blue F 525/50-A>")system.time(warpSetNCDF(x = fs ,stains = chnls))) user system elapsed 81.605 1.668 83.321

The enhanced warpSetNCDF based on #14https://github.com/RGLab/ncdfFlow/issues/14and

13 https://github.com/RGLab/ncdfFlow/issues/13 takes

system.time(warpSetNCDF(x = fs[,chnls] ,stains = chnls)))user system elapsed 50.599 2.516 53.649

As sample size grows, we should see more significant gains.

— Reply to this email directly or view it on GitHubhttps://github.com/RGLab/flowStats/issues/8#issuecomment-20015347 .

mikejiang commented 11 years ago

Since

R has no single precision data type

Thus mode setting to single is unnecessary in ncdfFlow::[[<-.

By removing it, we are able to squeeze out another 5s

user  system elapsed 
 45.527   1.508  47.898 
raphg commented 11 years ago

Soon the time will be negative if you keep going Mike.

On Wed, Jun 26, 2013 at 2:09 PM, Mike Jiang notifications@github.comwrote:

Since

R has no single precision data type

Thus mode setting to single is unnecessary in ncdfFlow::[[<-.

By removinghttps://github.com/RGLab/ncdfFlow/commit/94432f1cdb7b886398a9c6951d6c3f180fddc10cit, we are able to squeeze out another 5s

user system elapsed 45.527 1.508 47.898

— Reply to this email directly or view it on GitHubhttps://github.com/RGLab/flowStats/issues/8#issuecomment-20080229 .

mikejiang commented 11 years ago

Unfortunately I can't go any further, given that the current major bottleneck is at fda::landmarkreg call :

$by.total
                           total.time total.pct self.time self.pct
"system.time"                   50.22    100.00      0.00     0.00
"<Anonymous>"                   49.88     99.32      0.92     1.83
"eval"                          37.16     73.99      0.04     0.08
"capture.output"                34.14     67.98      0.00     0.00
"evalVis"                       34.14     67.98      0.00     0.00
"withVisible"                   34.14     67.98      0.00     0.00
"landmarkreg"                   34.12     67.94      0.00     0.00
"smooth.morph"                  32.60     64.91      0.04     0.08