HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
956 stars 83 forks source link

Persistent "Failed to retrieve the result of MulticoreFuture" error #390

Closed dwachsmuth closed 4 years ago

dwachsmuth commented 4 years ago

I need to compute correlations between very large matrices, and am trying to parallelize the task using {future}. I have discovered that there is a certain matrix size which will reliably produce a "Failed to retrieve the result of MulticoreFuture" error when I use plan(multicore). I.e. 100% failure rate. The same exact task succeeds 100% of the time with plan(sequential) or plan(multisession).

I am using {future.apply}, but I've verified that the same issue is present with {doFuture} and {foreach}, which suggested to me that the issue might be with {future} itself.

Because the object sizes have to be quite large, the reprex is a little annoying, but here it is.

library(future) library(foreach) plan(multicore)

# Create list of big matrices x_list <- list( matrix(data = rnorm(10 * 10000), nrow = 10), matrix(data = rnorm(10 * 10000), nrow = 10) )

# Create other list of big matrices y_list_small <- list( matrix(data = rnorm(10 * 26843), nrow = 10), matrix(data = rnorm(10 * 26843), nrow = 10) )

# Works result <- future.apply::future_mapply(cor, x_list, y_list_small, SIMPLIFY = FALSE)

# Create list of slightly bigger matrices y_list_big <- list( matrix(data = rnorm(10 * 26844), nrow = 10), matrix(data = rnorm(10 * 26844), nrow = 10) )

# Does not work result <- future.apply::future_mapply(cor, x_list, y_list_big, SIMPLIFY = FALSE)

# Increasing the vector size doesn't change results x_list_long <- list( matrix(data = rnorm(50 * 10000), nrow = 50), matrix(data = rnorm(50 * 10000), nrow = 50) )

y_list_small_but_long <- list( matrix(data = rnorm(50 * 26843), nrow = 50), matrix(data = rnorm(50 * 26843), nrow = 50) )

# Works result <- future.apply::future_mapply(cor, x_list_long, y_list_small_but_long, SIMPLIFY = FALSE)

# Same problem in doFuture doFuture::registerDoFuture() result <- vector("list", 2) foreach(i = 1:2) %dopar% {result[[i]] <- cor(x_list[[i]], y_list_big[[i]])}

HenrikBengtsson commented 4 years ago

Hi. What's your sessionInfo() and are you running this in the terminal or some gui such as RStudio?

dwachsmuth commented 4 years ago

Session info is pasted below. I'm running RStudio 1.3.959.

> sessionInfo()
R version 4.0.0 (2020-04-24)
Platform: x86_64-apple-darwin17.0 (64-bit)
Running under: macOS Catalina 10.15.5

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib

locale:
[1] en_CA.UTF-8/en_CA.UTF-8/en_CA.UTF-8/C/en_CA.UTF-8/en_CA.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] foreach_1.5.0      future.apply_1.5.0 future_1.17.0      devtools_2.3.0     usethis_1.6.1     

loaded via a namespace (and not attached):
 [1] Rcpp_1.0.4.6      compiler_4.0.0    iterators_1.0.12  prettyunits_1.1.1 remotes_2.1.1     RPostgres_1.2.0  
 [7] tools_4.0.0       testthat_2.3.2    digest_0.6.25     pkgbuild_1.0.8    pkgload_1.1.0     bit_1.1-15.2     
[13] memoise_1.1.0     pkgconfig_2.0.3   rlang_0.4.6       DBI_1.1.0         cli_2.0.2         rstudioapi_0.11  
[19] parallel_4.0.0    xfun_0.14         withr_2.2.0       desc_1.2.0        fs_1.4.1          vctrs_0.3.0      
[25] globals_0.12.5    hms_0.5.3         rprojroot_1.3-2   bit64_0.9-7       doFuture_0.9.0    glue_1.4.1       
[31] listenv_0.8.0     R6_2.4.1          processx_3.4.2    fansi_0.4.1       sessioninfo_1.1.1 callr_3.4.3      
[37] blob_1.2.1        magrittr_1.5      codetools_0.2-16  backports_1.1.7   ps_1.3.3          ellipsis_0.3.1   
[43] assertthat_0.2.1  tinytex_0.23      doParallel_1.0.15 crayon_1.3.4     
HenrikBengtsson commented 4 years ago

Since you're using multicore while running RStudio, you've must have explicitly re-enabled multicore processing. See ?future::supportsMulticore for details on why it's disabled by default. There's also a link to RStudio folks saying that forked processing should be avoided when using RStudio.

I recommend that you try your code in a plain R terminal session without RStudio and see if that makes a difference. If it still crashes, it could also be that you're running out of memory; cor(x,y) can be quite memory hungry. When you run out of memory and forked child processes dies, you get can get that kind of error you mentioning.

dwachsmuth commented 4 years ago

Hi Henrik,

The problem is exactly the same in the terminal. The "small" correlation completes every time, the incrementally larger one fails every time. (I'm aware of the potential issues with multicore processing in RStudio, but these kind of "dumb but large scale" operations have always been very stable for me, and the overhead of copying gigantic globals into PSOCK clusters almost completely cancels out the parallelization benefits.)

I'm also quite confident it's not a simple memory issue, since I'm doing this work on a computer with 384 GB of RAM, and I can run 32 threads of the small correlation from my reprex without any issues. profmem suggests that the correlation allocates about 2 GB of memory for a single thread. (Output pasted below.) Monitoring RAM at the system level shows that R is using 100 GB of RAM at the peak when I've got 32 threads going simultaneously, so ~ 3GB per thread.

profmem::profmem({output <- cor(x_list[[1]], y_list_big[[1]])})

Rprofmem memory profiling of:
{
    output <- cor(x_list[[1]], y_list_big[[1]])
}

Memory allocations:
Number of 'new page' entries not displayed: 3
       what      bytes calls
4     alloc 2147520048 cor()
5     alloc      80048 cor()
6     alloc     214800 cor()
7     alloc      40048 cor()
8     alloc     107424 cor()
total       2147962368      
dwachsmuth commented 4 years ago

I should add that I ran into this problem in the course of package development, where the use case will frequently involve these very large matrix correlations, and I want users to be able to use {future} to speed things up.

So just knowing that the code works with plan(multisession) isn't very helpful, since I guess I would have to detect a multicore future within the package and then conditionally disable multithreaded processing in that case, which would be a very poor user experience, given that the rest of the package works fine (and is indeed far faster) with multicore futures.

HenrikBengtsson commented 4 years ago

Another step towards narrowing down the source of the problem. Does it also crash if you call parallel::mcmapply()? That should be the closest to what your code runs.

Also, try setting options(future.fork.multithreading.enable = FALSE). This will should disable multi-thread processing in your forked processes. Multi-threading and forked processing is also known to causes issues in R.

FYI, I don't think this problem is related to the future framework per se.

EDIT 2021-07-07: Fix typo; options() and not option()

dwachsmuth commented 4 years ago

Ok, parallel::mcmapply() produced the same error, which means you're right and the problem isn't related to {future}. It's still problematic for my package, though, because I can't assume what kind of plan users will be setting before running the function.

But probably the solution will just be to test for matrices above a certain size and split them preemptively.

Thanks for looking into this!

(Incidentally, I wasn't able to get the future.fork.multithreading.enable option to work. I received an error saying disabling multithreading wasn't possible on my system.)

HenrikBengtsson commented 4 years ago

The future.fork.multithreading.enable is a beta feature where I've been trying to introduce in an as robust way as possible. All it end up doing internally is to try force single-threaded processing by calling:

RhpcBLASctl::omp_set_num_threads(1L)

You could try to call that in your mcmapply() function too.

However, there are probably better and more robust ways to disable multi-threaded processing in R, e.g. setting

export OMP_NUM_THREADS=1

before launching R. See https://github.com/HenrikBengtsson/future/issues/255 for other env vars that you could also set to 1. After doing this, see if mcmapply() still fails.

The real problem here is that you have something that is unstable, and my best guess is that it's due to using forked processing and multi-threading at the same time. This is never a good situation. Even if you can find a workaround on your local system that seems to avoid triggering problems you will never know if this is the case for others. There are so many things that can go on here and not understand what the real causes it, I would refrain from doing ad-hoc workarounds. They will come back and bite you or and end-user!

I'm leaning more and more to tell all developers and users to not use forked processing in R. Here is what the author of mclapply() wrote in R-devel thread 'mclapply returns NULLs on MacOS when running GAM' (https://stat.ethz.ch/pipermail/r-devel/2020-April/079384.html) on 2020-04-28:

Do NOT use mcparallel() in packages except as a non-default option that user can set for the reasons Henrik explained. Multicore is intended for HPC applications that need to use many cores for computing-heavy jobs, but it does not play well with RStudio and more importantly you don't know the resource available so only the user can tell you when it's safe to use. Multi-core machines are often shared so using all detected cores is a very bad idea. The user should be able to explicitly enable it, but it should not be enabled by default.

I'm closing but feel free to comment further.

dwachsmuth commented 4 years ago

Many thanks for the additional information/feedback!

I want to be clear, though, that the code in my package simply calls future.apply::future_mapply(), and it is through the process of using the in-development package for my lab's research that I discovered the issue with large matrices and multicore futures. And in fact the intention for our internal use is specifically to run our code on HPCs where multicore processing is fairly common.

In other words, I think I am following the recommended design pattern to use {future}--write the package with no assumptions about the type of plan a user will use. But it looks like the code will break if a user sets plan(multicore) (which I have no control over) and happens to supply matrices of more than a certain size.

So I could leave the code as is, and maybe include a warning in the package docs that there's a known problem with multicore parallelism, or I could try to detect the (potentially rare?) combination of very large inputs and plan(multicore) and either fail more informatively or chunk the job into a larger number of smaller matrices.

dwachsmuth commented 4 years ago

Also, a quick update. Setting the OMP threads to 1 didn't change the problem: both future.apply::future_mapply and parallel::mcmapply return an error that one or more cores did not deliver results.

HenrikBengtsson commented 4 years ago

I only see now that your session info mentions BLAS, so then retry with

export OPENBLAS_NUM_THREADS=1
HenrikBengtsson commented 4 years ago

And obviously, make sure to troubleshoot in a fresh R session in the terminal; R --vanilla

agilebean commented 3 years ago

Also, try setting option(future.fork.multithreading.enable = FALSE). This will should disable multi-thread processing in your forked processes. Multi-threading and forked processing is also known to causes issues in R.

This solves quite some cases, and is very useful, but a tiny remark: It is options() not option()