HenrikBengtsson / future

:rocket: R package: future: Unified Parallel and Distributed Processing in R for Everyone
https://future.futureverse.org
946 stars 82 forks source link

R session crash if plan is not changed back to sequential after using multisession (in linux) #689

Closed MalditoBarbudo closed 1 year ago

MalditoBarbudo commented 1 year ago

(Please use https://github.com/HenrikBengtsson/future/discussions for Q&A)

Describe the bug

When using future::plan(future::multisession) in linux, this creates as R workers as especified in the workers arguments that are not closed until the plan is changed back to sequential or the session is closed/reinitialized. The latter in Rstudio almost always results in a crash of the R session

Reproduce example

Simply changing the plan to multisession creates the issue. R session processes can be monitored with htop, btm or the console system monitor of your choice:

  1. Initial btm screenshot in a freshly new session:

image

  1. Changing the plan
# changing the plan
future::plan(future::multisession)

btm screenshot after changing plan:

image

As it can seen, R processes appear.

  1. executing code
furrr::future_map(
  c(1:10), \(x) {1e10}
)

btm screenshot showing that memory is still used in each fork after computation ends:

image

  1. Restarting the session (building and restarting for package development, changing project...) crash R session causing it to restart and can lead to data loss.

  2. Changing the plan to sequential before exiting the session removes sucessfully the R workers

future::plan(future::sequential)

btm screenshot after changing to sequential:

image

Expected behavior

When ussing plan(multisession) I expect those processes to be terminated after computation finish and results are gathered, or at least that RStudio does not crash if plan is not changed back to sequential before restarting/closing the session.

Session information

sessionInfo()
#> R version 4.3.1 (2023-06-16)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Arch Linux
#> 
#> Matrix products: default
#> BLAS/LAPACK: /usr/lib/libopenblas.so.0.3;  LAPACK version 3.11.0
#> 
#> locale:
#>  [1] LC_CTYPE=es_ES.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=es_ES.UTF-8        LC_COLLATE=es_ES.UTF-8    
#>  [5] LC_MONETARY=es_ES.UTF-8    LC_MESSAGES=es_ES.UTF-8   
#>  [7] LC_PAPER=es_ES.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=es_ES.UTF-8 LC_IDENTIFICATION=C       
#> 
#> time zone: Europe/Madrid
#> tzcode source: system (glibc)
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> loaded via a namespace (and not attached):
#>  [1] digest_0.6.31   fastmap_1.1.1   xfun_0.39       glue_1.6.2     
#>  [5] knitr_1.42      htmltools_0.5.5 rmarkdown_2.21  lifecycle_1.0.3
#>  [9] cli_3.6.1       reprex_2.0.2    withr_2.5.0     compiler_4.3.1 
#> [13] rstudioapi_0.14 tools_4.3.1     evaluate_0.21   yaml_2.3.7     
#> [17] rlang_1.1.1     fs_1.6.2

Created on 2023-06-23 with reprex v2.0.2

Future session info when in sequential

future::futureSessionInfo()
#> *** Package versions
#> future 1.32.0, parallelly 1.35.0, parallel 4.3.1, globals 0.16.2, listenv 0.9.0
#> 
#> *** Allocations
#> availableCores():
#> system  nproc 
#>     16     16
#> availableWorkers():
#> $nproc
#>  [1] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
#>  [7] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
#> [13] "localhost" "localhost" "localhost" "localhost"
#> 
#> $system
#>  [1] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
#>  [7] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
#> [13] "localhost" "localhost" "localhost" "localhost"
#> 
#> *** Settings
#> - future.plan=<not set>
#> - future.fork.multithreading.enable=<not set>
#> - future.globals.maxSize=<not set>
#> - future.globals.onReference=<not set>
#> - future.resolve.recursive=<not set>
#> - future.rng.onMisuse=<not set>
#> - future.wait.timeout=<not set>
#> - future.wait.interval=<not set>
#> - future.wait.alpha=<not set>
#> - future.startup.script=<not set>
#> 
#> *** Backends
#> Number of workers: 1
#> List of future strategies:
#> 1. sequential:
#>    - args: function (..., envir = parent.frame())
#>    - tweaked: FALSE
#>    - call: NULL
#> 
#> *** Basic tests
#> Main R session details:
#>     pid     r sysname       release
#> 1 28914 4.3.1   Linux 6.3.8-arch1-1
#>                                                  version nodename machine
#> 1 #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#>     login    user effective_user
#> 1 user001 user001        user001
#> Worker R session details:
#>   worker   pid     r sysname       release
#> 1      1 28914 4.3.1   Linux 6.3.8-arch1-1
#>                                                  version nodename machine
#> 1 #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#>     login    user effective_user
#> 1 user001 user001        user001
#> Number of unique worker PIDs: 1 (as expected)

Created on 2023-06-23 with reprex v2.0.2

Future session info when in multiprocess

future::plan(future::multisession)
future::futureSessionInfo()
#> *** Package versions
#> future 1.32.0, parallelly 1.35.0, parallel 4.3.1, globals 0.16.2, listenv 0.9.0
#> 
#> *** Allocations
#> availableCores():
#> system  nproc 
#>     16     16
#> availableWorkers():
#> $nproc
#>  [1] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
#>  [7] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
#> [13] "localhost" "localhost" "localhost" "localhost"
#> 
#> $system
#>  [1] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
#>  [7] "localhost" "localhost" "localhost" "localhost" "localhost" "localhost"
#> [13] "localhost" "localhost" "localhost" "localhost"
#> 
#> *** Settings
#> - future.plan=<not set>
#> - future.fork.multithreading.enable=<not set>
#> - future.globals.maxSize=<not set>
#> - future.globals.onReference=<not set>
#> - future.resolve.recursive=<not set>
#> - future.rng.onMisuse=<not set>
#> - future.wait.timeout=<not set>
#> - future.wait.interval=<not set>
#> - future.wait.alpha=<not set>
#> - future.startup.script=<not set>
#> 
#> *** Backends
#> Number of workers: 16
#> List of future strategies:
#> 1. multisession:
#>    - args: function (..., workers = availableCores(), lazy = FALSE, rscript_libs = .libPaths(), envir = parent.frame())
#>    - tweaked: FALSE
#>    - call: future::plan(future::multisession)
#> 
#> *** Basic tests
#> Main R session details:
#>     pid     r sysname       release
#> 1 29680 4.3.1   Linux 6.3.8-arch1-1
#>                                                  version nodename machine
#> 1 #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#>     login    user effective_user
#> 1 user001 user001        user001
#> Worker R session details:
#>    worker   pid     r sysname       release
#> 1       1 29737 4.3.1   Linux 6.3.8-arch1-1
#> 2       2 29731 4.3.1   Linux 6.3.8-arch1-1
#> 3       3 29730 4.3.1   Linux 6.3.8-arch1-1
#> 4       4 29742 4.3.1   Linux 6.3.8-arch1-1
#> 5       5 29741 4.3.1   Linux 6.3.8-arch1-1
#> 6       6 29740 4.3.1   Linux 6.3.8-arch1-1
#> 7       7 29729 4.3.1   Linux 6.3.8-arch1-1
#> 8       8 29739 4.3.1   Linux 6.3.8-arch1-1
#> 9       9 29738 4.3.1   Linux 6.3.8-arch1-1
#> 10     10 29735 4.3.1   Linux 6.3.8-arch1-1
#> 11     11 29744 4.3.1   Linux 6.3.8-arch1-1
#> 12     12 29736 4.3.1   Linux 6.3.8-arch1-1
#> 13     13 29734 4.3.1   Linux 6.3.8-arch1-1
#> 14     14 29743 4.3.1   Linux 6.3.8-arch1-1
#> 15     15 29732 4.3.1   Linux 6.3.8-arch1-1
#> 16     16 29733 4.3.1   Linux 6.3.8-arch1-1
#>                                                   version nodename machine
#> 1  #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 2  #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 3  #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 4  #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 5  #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 6  #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 7  #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 8  #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 9  #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 10 #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 11 #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 12 #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 13 #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 14 #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 15 #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#> 16 #1 SMP PREEMPT_DYNAMIC Wed, 14 Jun 2023 20:10:31 +0000  host001  x86_64
#>      login    user effective_user
#> 1  user001 user001        user001
#> 2  user001 user001        user001
#> 3  user001 user001        user001
#> 4  user001 user001        user001
#> 5  user001 user001        user001
#> 6  user001 user001        user001
#> 7  user001 user001        user001
#> 8  user001 user001        user001
#> 9  user001 user001        user001
#> 10 user001 user001        user001
#> 11 user001 user001        user001
#> 12 user001 user001        user001
#> 13 user001 user001        user001
#> 14 user001 user001        user001
#> 15 user001 user001        user001
#> 16 user001 user001        user001
#> Number of unique worker PIDs: 16 (as expected)

Created on 2023-06-23 with reprex v2.0.2

scottkosty commented 1 year ago

I don't have much knowledge in this area, but one thing you could try, both to narrow in on the root issue of the crash and as an alternative to "multisession" is to install the package "future.callr" and then use the following plan (instead of the "multisession" plan):

future::plan(future.callr::callr)

If you don't switch back to sequential at the end, do things still work well with the above plan?

MalditoBarbudo commented 1 year ago

@scottkosty Yes, using future.callr::callr as plan works as intended. R processes are spammed when computation begins, and closed afterwards, freeing the memory. Also, Rstudio doesn't crash when restarting/closing session without changing plan to sequential.

I can use this workaround for the moment for my own development, but I would like to know why is this happening with plans from future, as is the default recommendation in docs from furrr and other packages and probably the one that users of the package I'm developing will use. Is there any more info I can provide to help narrowing the issue?