Bioconductor / BiocParallel

Bioconductor facilities for parallel evaluation
https://bioconductor.org/packages/BiocParallel
65 stars 29 forks source link

The running time isn't reduced when using bplapply()? #247

Closed anglixue closed 1 year ago

anglixue commented 1 year ago

Hi, It's my first time to use this package and I am following the vignette.

I'd like to try out if my setting is correct so I run a simple function to test the parallel jobs.


library(BiocParallel)
library(stats)
FUN <- function(x) { round(sqrt(x), 4) }
registered()

options(MulticoreParam=quote(MulticoreParam(workers=60)))

param <- SnowParam(workers = 60, type = "SOCK")

However, I found using bplapply takes much longer than the for loop.

start = Sys.time()
tmp <- bplapply(1:100000000, FUN, BPPARAM = param)
print( Sys.time() - start )

Time difference of 9.669251 mins

start = Sys.time()
for(i in 1:100000000){round(sqrt(i), 4)}
print( Sys.time() - start )

Time difference of 41.66819 secs

Does anyone know if I did anything wrong?

Thanks for your help!

Jiefei-Wang commented 1 year ago

Hi,

This is a typical misconception about parallel computing. Your task is too simple to see any improvement. What you did is like taking a spaceship just for buying some potatoes from Matt Damon on Mars and then complaining the food cost is too high.

Here is how the time was spent (roughly)

  1. Creating 60 workers in the background: 9 mins
  2. Splitting and sending your task to the workers: 15 secs
  3. Doing the computation: 1 sec
  4. Receiving the result from the workers: 15 secs

For a more practical example, you should try a harder task/smaller worker number to see the actual speed up

Jiefei

mtmorgan commented 1 year ago

I see

> param <- SnowParam(workers = 60, type = "SOCK")
> system.time(bpstart(param))
   user  system elapsed
  0.099   0.189  27.900

which is still quite a long time but not 9 minutes. Maybe the long startup time is due to over-subscription (e.g., your computer has 8 cores for computing, but you're asking to use 60), or you're running out of memory (each worker starts a new R process...) so the computer is 'swapping' to disk.

MulticoreParam() (on non-Windows) is much faster at start-up

> system.time(p <- bpstart(MulticoreParam(workers = 60)))
   user  system elapsed
  0.073   0.074   1.347

Issue https://github.com/Bioconductor/BiocParallel/issues/231 indicates that a PSOCK implementation would be much faster than SOCK for startup.

FWIW my favorite light-weight example of parallel evaluation is when the worker 'does nothing', so for instance it is not surprising that

sleeper = function(i) { Sys.sleep(1); i }
res <- lapply(1:10, sleeper)

takes about 10 seconds, whereas

bplapply(1:10, sleeper, BPPARAM = MulticoreParam(10))

takes about 1 second, for a 10x speedup for parallel evaluation. similar results apply when the worker does something for a second, e.g.,

spinner = function(i) { 
    t <- Sys.time()
    j <- 0
    while(Sys.time() - t < 1)
        j <- j + 1
    j
}
anglixue commented 1 year ago

Sorry for not getting back to you earlier. Yes, I realized the actual computation bottleneck is creating multiple workers rather than the main function.

Thank you!