HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
956 stars 83 forks source link

Failure to fork #226

Closed cfhammill closed 6 years ago

cfhammill commented 6 years ago

I ran into a bit of trouble with using future in a shiny app. The futures and the error reporting were failing mysteriously. Unfortunately I haven't been able to simplify the error reporting problem to a reprex, but the root cause of the problem was failing to fork.

options(mc.cores = 2)
lapply(1:4, function(i) future({ Sys.sleep(50) }) )

yields:

Error: Detected an error (‘fatal error in wrapper code’) by the 'parallel' package while trying to retrieve the value of a MulticoreFuture (‘<none>’). This could be because the forked R process that evaluates the future was terminated before it was completed: ‘{; Sys.sleep(50); }’

Is failing to fork desirable behaviour? Or should future block until it can fork successfully?

HenrikBengtsson commented 6 years ago

No, this should work It could possibly be related to updates in R (>= 3.5.0), but I need to know more. What's your sessionInfo()?

cfhammill commented 6 years ago

I haven't upgraded yet, I'm running a slightly older R version. Pleased to hear it should work.

sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /axiom2/projects/software/arch/linux-xenial-xerus/R/3.4.1/lib/R/lib/libRblas.so
LAPACK: /axiom2/projects/software/arch/linux-xenial-xerus/R/3.4.1/lib/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_CA.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_CA.UTF-8        LC_COLLATE=en_CA.UTF-8    
 [5] LC_MONETARY=en_CA.UTF-8    LC_MESSAGES=en_CA.UTF-8   
 [7] LC_PAPER=en_CA.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_CA.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] shinyjs_1.0   purrr_0.2.4   future_1.8.1  RMINC_1.5.1.0 dplyr_0.7.4  
[6] glue_1.2.0    shiny_1.0.5  

loaded via a namespace (and not attached):
 [1] viridis_0.5.0         tidyr_0.8.0           bit64_0.9-7          
 [4] jsonlite_1.5          viridisLite_0.3.0     splines_3.4.1        
 [7] assertthat_0.2.0      base64url_1.2         blob_1.1.1           
[10] yaml_2.1.19           progress_1.1.2        globals_0.11.0       
[13] pillar_1.1.0          RSQLite_2.1.0         backports_1.1.2      
[16] lattice_0.20-35       downloader_0.4        digest_0.6.15        
[19] RColorBrewer_1.1-2    checkmate_1.8.5       minqa_1.2.4          
[22] colorspace_1.3-2      htmltools_0.3.6       httpuv_1.3.5         
[25] Matrix_1.2-14         plyr_1.8.4            XML_3.98-1.9         
[28] pkgconfig_2.0.1       listenv_0.7.0         DiagrammeR_1.0.0     
[31] xtable_1.8-2          scales_0.5.0          brew_1.0-6           
[34] lme4_1.1-17           tibble_1.4.2          ggplot2_2.2.1        
[37] influenceR_0.1.0      withr_2.1.2           lazyeval_0.2.0       
[40] rgexf_0.15.3          magrittr_1.5          crayon_1.3.4         
[43] mime_0.5              memoise_1.1.0         data.tree_0.7.5      
[46] nlme_3.1-131          MASS_7.3-47           Rook_1.1-1           
[49] tools_3.4.1           data.table_1.10.4-3   prettyunits_1.0.2    
[52] hms_0.4.1             BBmisc_1.11           gridBase_0.4-7       
[55] stringr_1.3.0         sendmailR_1.2-1       munsell_0.4.3        
[58] bindrcpp_0.2          compiler_3.4.1        rlang_0.2.0          
[61] debugme_1.1.0         grid_3.4.1            nloptr_1.0.4         
[64] rstudioapi_0.7        rjson_0.2.18          rappdirs_0.3.1       
[67] htmlwidgets_1.0       visNetwork_2.0.3      igraph_1.2.1         
[70] base64enc_0.1-3       codetools_0.2-15      gtable_0.2.0         
[73] DBI_0.8               R6_2.2.2              gridExtra_2.3        
[76] bit_1.1-12            bindr_0.1             readr_1.1.1          
[79] stringi_1.1.7         parallel_3.4.1        BatchJobs_1.7        
[82] Rcpp_0.12.16          batchtools_0.9.7-9002
HenrikBengtsson commented 6 years ago

Thanks. I've tried to reproduce, but failed. Can you reproduce it all the time, or is it only sporadic? Can you reproduce it in a fresh R session, i.e. without all those other packages loaded?

I've tried with R 3.5.0 on Ubuntu 16.04 and R 3.4.1 on RHEL 6.6, but there it worked. I ran the following in a fresh R session:

> library("future")
> options(mc.cores = 2L)
> plan(multicore)
> fs <- lapply(1:4, function(i) future({ Sys.sleep(50) }) )

Details

> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.4 LTS

Matrix products: default
BLAS: /usr/lib/atlas-base/atlas/libblas.so.3.0
LAPACK: /usr/lib/atlas-base/atlas/liblapack.so.3.0

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] future_1.8.1

loaded via a namespace (and not attached):
[1] compiler_3.5.0   parallel_3.5.0   listenv_0.7.0    codetools_0.2-15
[5] digest_0.6.15    globals_0.11.0 
> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-pc-linux-gnu (64-bit)

Matrix products: default
BLAS: /home/shared/cbc/software_cbc/R/R-3.4.1-20170630/lib64/R/lib/libRblas.so
LAPACK: /home/shared/cbc/software_cbc/R/R-3.4.1-20170630/lib64/R/lib/libRlapack.so

locale:
[1] C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] future_1.8.1

loaded via a namespace (and not attached):
[1] compiler_3.4.1   parallel_3.4.1   tools_3.4.1      listenv_0.7.0   
[5] codetools_0.2-15 digest_0.6.15    globals_0.11.0  
cfhammill commented 6 years ago

It's consistent, I'm using futures as part of a shiny app and it's failing every run.

In a new session it appears to work once, but if you Ctrl-C to kill the assignment and start again you run into this error.

HenrikBengtsson commented 6 years ago

So, only when using Shiny and not with just that piece of code by itself?

FYI, Shiny 1.1 will soon be released to CRAN (https://twitter.com/jcheng/status/994980498605621248). It has built-in support for futures via the promises package. I'm not sure if it is relevant to your design or not. But if you're using Shiny + future in a way that Shiny 1.1 is intended for, have a look at that upcoming version.

cfhammill commented 6 years ago

Unfortunately no, that piece of code fails in a new session too, you just have to cancel it once to get it to fail. After the initial cancel all subsequent runs fail:

> library("future")
> options(mc.cores = 2L)
> plan(multicore)
> fs <- lapply(1:4, function(i) future({ Sys.sleep(50) }) )
^C
> fs <- lapply(1:4, function(i) future({ Sys.sleep(50) }) )
Error: Detected an error (‘fatal error in wrapper code’) by the 'parallel' package while trying to retrieve the value of a MulticoreFuture (‘<none>’). This could be because the forked R process that evaluates the future was terminated before it was completed: ‘{; Sys.sleep(50); }’

Something about cancelling the first run puts it into a state such that all subsequent runs fail.

And I'm not sure re: promises/async, it may fill my use case but I haven't inspected further.

HenrikBengtsson commented 6 years ago

Unfortunately no, that piece of code fails in a new session too [...]

That's "fortunately" in my ears, because it narrows down where it may occur. Even better, with your clarification, I can now reproduce this (also in R 3.5.0). Now I have something to work with. I'll treat it as a bug for now.

FYI, unless you really want forked processes, a workaround could be to use plan(multisession) or plan(future.callr::callr).

HenrikBengtsson commented 6 years ago

Some troubleshooting on this. When sending a SIGINT ("Ctrl-C") as done here, it can terminate one of the forked R processes, which results in the signaled error (of class FutureError). This is expected. However, the corresponding future is not cleaned out of the internal FutureRegistry when this happens. More importantly, it will keep signal that error whenever the future framework tries to clean it out.

> fs <- lapply(1:4, function(i) future({ Sys.sleep(50) }) )
Error: Detected an error ('fatal error in wrapper code') by the 'parallel' package while trying to retrieve the value of a MulticoreFuture ('<none>'). This could be because the forked R process that evaluates the future was terminated before it was completed: '{; Sys.sleep(50); }'

Enter a frame number, or 0 to exit   

 1: lapply(1:4, function(i) future({
    Sys.sleep(50)
}))
 2: FUN(X[[i]], ...)
 3: #1: future({
    Sys.sleep(50)
})
 4: evaluator(expr, envir = envir, substitute = FALSE, lazy = lazy, seed = seed, globals = globals, packages = packages, ...)
 5: run(future)
 6: run.MulticoreFuture(future)
 7: requestCore(await = function() FutureRegistry("multicore", action = "collect-first"), workers = future$workers)
 8: await()
 9: FutureRegistry("multicore", action = "collect-first")
10: collectValues(where, futures = futures, firstOnly = TRUE)
11: value(future, signal = FALSE)
12: value.Future(future, signal = FALSE)
13: result(future)
14: result.MulticoreFuture(future)
15: stop(result)

A solution should be to make the FutureRegistry "collect-first" robust against FutureError:s.

wlandau commented 6 years ago

I am also having trouble with forks on R 3.5.0, and I am glad to hear I am not alone! I will be following this thread closely. Possibly related: https://github.com/r-lib/testthat/issues/757, https://github.com/r-lib/testthat/issues/757.

In general, what do you think might have changed in R 3.5.0? I found some references to mclapply() in the changelog, but I am having trouble gleaning insights from it.

HenrikBengtsson commented 6 years ago

This has been fixed in the develop branch. This was due to not allowing for future orchestration errors (FutureError) that were correctly detected but incorrectly resignalled every time a new multicore future was created. This one was not due to any upstream changes in R 3.5.0/parallel.