futureverse / future.apply

:rocket: R package: future.apply - Apply Function to Elements in Parallel using Futures
https://future.apply.futureverse.org
211 stars 16 forks source link

Using multiple cores in RStudio #88

Closed waynelapierre closed 3 years ago

waynelapierre commented 3 years ago

When I use plan(multicore) in conjunction with future_Map in RStudio, R seems to use multiple cores. However, your future GitHub website says that using multiple cores in RStudio is not supported, which confuses me. I am using future.apply_1.7.0 in R 4.0.5 on Fedora 34 Linux OS. Any clarification would be greatly appreciated.

HenrikBengtsson commented 3 years ago

It shouldn't since forked processing is unreliable in many GUIs including RStudio. This is what I get in RStudio 1.4.1717 with R 4.1.0 on Ubuntu 18.04:

> parallelly::supportsMulticore()
[1] FALSE
> parallelly:::supportsMulticoreAndRStudio()
[1] FALSE
> sessionInfo()
R version 4.1.0 Patched (2021-06-26 r80566)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 18.04.5 LTS

Matrix products: default
BLAS:   /home/hb/software/R-devel/R-4-1-branch/lib/R/lib/libRblas.so
LAPACK: /home/hb/software/R-devel/R-4-1-branch/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8     LC_MONETARY=en_US.UTF-8   
 [6] LC_MESSAGES=en_US.UTF-8    LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
[1] compiler_4.1.0    parallelly_1.26.1 startup_0.15.0    parallel_4.1.0    tools_4.1.0

Here's to check if you actually run in parallel workers or not. You should get one unique PID per worker. I only get one in RStudio:

> library(future.apply)
> plan(sequential)
> future_sapply(1:nbrOfWorkers(), function(i) c(i = i, pid = Sys.getpid()))
    [,1]
i      1
pid 9588>

> plan(multicore)
> future_sapply(1:nbrOfWorkers(), function(i) c(i = i, pid = Sys.getpid()))
    [,1]
i      1
pid 9588

> plan(multicore, workers = 3)
> 
> future_sapply(1:nbrOfWorkers(), function(i) c(i = i, pid = Sys.getpid()))
    [,1] [,2] [,3]
i      1    2    3
pid 9588 9588 9588
Warning message:
In supportsMulticoreAndRStudio(...) :
[ONE-TIME WARNING] Forked processing ('multicore') is not supported when running R from RStudio because it is considered unstable. For more details, how to control forked processing or not, and how to silence this warning in future R sessions, see ?parallelly::supportsMulticore

As you see, all run in the same process (PID) as sequential.

waynelapierre commented 3 years ago

Thanks for the clarification. I can replicate your results on my computer. Looking at my system monitor, I see most threads having high usage when I use future_Map and plan(multicore). Could that still be sequential instead of using multiple cores?

HenrikBengtsson commented 3 years ago

If you see multiple cores running when using this, which is equivalent to using plan(sequential), then there's something else that runs in parallel, which is not using the future framework, e.g. multithreaded Rcpp code. See what you get with purrr:map()

waynelapierre commented 3 years ago

Thanks for the reply. That could be the reason. I think I will still keep using plan(multicore) in conjunction with future_Map in RStudio on my Linux OS in case you implement this support in the future. Do you think there will be any unexpected side effects of doing this?

HenrikBengtsson commented 3 years ago

Forked processing can crash RStudio. Not all R packages are fork-proof. That's why we recommend against it. It's nothing future, or any other R parallelization framework can fix.

waynelapierre commented 3 years ago

OK. If I use plan(sequential) with future_sapply, will that be faster than R's base sapply?

scottkosty commented 3 years ago

then there's something else that runs in parallel, which is not using the future framework, e.g. multithreaded Rcpp code

Another possible source: multithreaded BLAS.