Closed anglixue closed 1 year ago
Hi,
This is a typical misconception about parallel computing. Your task is too simple to see any improvement. What you did is like taking a spaceship just for buying some potatoes from Matt Damon on Mars and then complaining the food cost is too high.
Here is how the time was spent (roughly)
For a more practical example, you should try a harder task/smaller worker number to see the actual speed up
Jiefei
I see
> param <- SnowParam(workers = 60, type = "SOCK")
> system.time(bpstart(param))
user system elapsed
0.099 0.189 27.900
which is still quite a long time but not 9 minutes. Maybe the long startup time is due to over-subscription (e.g., your computer has 8 cores for computing, but you're asking to use 60), or you're running out of memory (each worker starts a new R process...) so the computer is 'swapping' to disk.
MulticoreParam()
(on non-Windows) is much faster at start-up
> system.time(p <- bpstart(MulticoreParam(workers = 60)))
user system elapsed
0.073 0.074 1.347
Issue https://github.com/Bioconductor/BiocParallel/issues/231 indicates that a PSOCK implementation would be much faster than SOCK for startup.
FWIW my favorite light-weight example of parallel evaluation is when the worker 'does nothing', so for instance it is not surprising that
sleeper = function(i) { Sys.sleep(1); i }
res <- lapply(1:10, sleeper)
takes about 10 seconds, whereas
bplapply(1:10, sleeper, BPPARAM = MulticoreParam(10))
takes about 1 second, for a 10x speedup for parallel evaluation. similar results apply when the worker does something for a second, e.g.,
spinner = function(i) {
t <- Sys.time()
j <- 0
while(Sys.time() - t < 1)
j <- j + 1
j
}
Sorry for not getting back to you earlier. Yes, I realized the actual computation bottleneck is creating multiple workers rather than the main function.
Thank you!
Hi, It's my first time to use this package and I am following the vignette.
I'd like to try out if my setting is correct so I run a simple function to test the parallel jobs.
However, I found using bplapply takes much longer than the for loop.
Does anyone know if I did anything wrong?
Thanks for your help!