MultisessionFuture () failed to call gassign() on cluster ...... #597

elgabbas commented 2 years ago


I am trying to use the furrr package, but I think I am having a problem related to the use of future.

First, when I prepare to run the script in parallel using the following command, I receive these warnings. I am unsure if I should ignore them or there is an issue I need to solve.

plan(multisession, workers = 20)

During startup - Warning messages: 1: package ‘utils’ in options("defaultPackages") was not found 2: package ‘stats’ in options("defaultPackages") was not found During startup - Warning messages: 1: package ‘utils’ in options("defaultPackages") was not found 2: package ‘stats’ in options("defaultPackages") was not found During startup - Warning messages: 1: package ‘utils’ in options("defaultPackages") was not found During startup - 2: package ‘stats’ in options("defaultPackages") was not found Warning messages: 1: package ‘utils’ in options("defaultPackages") was not found 2: package ‘stats’ in options("defaultPackages") was not found During startup - Warning messages: 1: package ‘utils’ in options("defaultPackages") was not found 2: package ‘stats’ in options("defaultPackages") was not found During startup - Warning messages: 1: package ‘utils’ in options("defaultPackages") was not found 2: package ‘stats’ in options("defaultPackages") was not found During startup - Warning message: package ‘utils’ in options("defaultPackages") was not found During startup - Warning messages: 1: package ‘utils’ in options("defaultPackages") was not found 2: package ‘stats’ in options("defaultPackages") was not found During startup - Warning message: package ‘utils’ in options("defaultPackages") was not found During startup - Warning message: package ‘utils’ in options("defaultPackages") was not found During startup - Warning message: package ‘utils’ in options("defaultPackages") was not found During startup - Warning message: package ‘utils’ in options("defaultPackages") was not found Error: Initialization of plan() failed, because the test future used for validation failed. The reason was: Package 'future' is not installed on worker (r_version: 4.1.2 (2021-11-01); platform: x86_64-conda-linux-gnu (64-bit); os: Linux 3.10.0-862.14.4.el7.x86_64 #1 SMP Wed Sep 26 15:12:11 UTC 2018; hostname: prod-0316)

Then, when I use map function from the furrr package, I receive the following error:

Error: Problem with mutate() column Preds. Preds = future_map(...). MultisessionFuture () failed to call gassign() on cluster RichSOCKnode #2 (PID 114424 on 'localhost'). The reason reported was 'error reading from connection'. Post-mortem diagnostic: The total size of the 15 globals exported is 14.99 MiB. The three largest globals are '...furrr_chunk_args' (14.35 MiB of class 'list'), 'Models_WS' (374.44 KiB of class 'list') and 'predictMaxEnt' (185.81 KiB of class 'function') Backtrace:

  1. ├─global::ExtractPreds(Data = CurrD)
  2. │ └─%>%(...)
  3. ├─tidyr::unnest(., cols = c(data))
  4. ├─dplyr::rename_at(...)
  5. │ └─dplyr:::tbl_at_vars(.tbl, .vars, .include_group_vars = TRUE)
  6. │ └─dplyr::tbl_vars(tbl)
  7. │ ├─dplyr:::new_sel_vars(tbl_vars_dispatch(x), group_vars(x))
  8. │ │ └─base::structure(...)
  9. │ └─dplyr:::tbl_vars_dispatch(x)
    1. ├─tidyr::unnest(., cols = c(Preds))
    2. ├─dplyr::mutate(...)
    3. ├─dplyr:::mutate.data.frame(...)
    4. │ └─dplyr:::mutate_cols(.data, ..., caller_env = caller_env())
    5. │ ├─base::withCallingHandlers(...)
    6. │ └─mask$eval_all_mutate(quo)
    7. └─furrr::future_map(...)
    8. └─furrr:::furrr_map_template(...)
    9. └─furrr:::furrr_template(...)
    10. └─future::future(...)
    11. ├─future::run(future)
    12. └─future:::run.Future(future)
    13. ├─future::run(future)
    14. └─future:::run.ClusterFuture(future)
    15. ├─base::suppressWarnings(...)
    16. │ └─base::withCallingHandlers(...)
    17. └─future:::cluster_call(...) Execution halted Warning message: system call failed: Cannot allocate memory

Please note that I am running this code on an HPC computer (CentOS Linux 7 ), with a total of 110 GB ram and 36 nodes. I used conda to install R (v 4.1.2). The same code worked nicely on different R installations but failed on this computer. Here is the results of the session info:


R version 4.1.2 (2021-11-01) Platform: x86_64-conda-linux-gnu (64-bit) Running under: CentOS Linux 7 (Core) Matrix products: default BLAS/LAPACK: /home/----/-----/.conda/envs/REnv2/lib/libopenblasp-r0.3.18.so locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] parallel stats graphics grDevices utils datasets methods base

other attached packages: [1] tibble_3.1.6 readr_2.1.2 furrr_0.2.3 future_1.24.0 purrr_0.3.4 dismo_1.3-5 raster_3.5-15 sp_1.4-6 sf_1.0-6 stringr_1.4.0 [11] tidyr_1.2.0 dplyr_1.0.7 snow_0.4-4

loaded via a namespace (and not attached): [1] Rcpp_1.0.8 pillar_1.7.0 compiler_4.1.2 class_7.3-20 tools_4.1.2 digest_0.6.29 lifecycle_1.0.1 lattice_0.20-45 [9] pkgconfig_2.0.3 rlang_0.4.12 cli_3.2.0 DBI_1.1.2 e1071_1.7-9 terra_1.5-21 hms_1.1.1 globals_0.14.0 [17] generics_0.1.2 vctrs_0.3.8 classInt_0.4-3 grid_4.1.2 tidyselect_1.1.1 glue_1.6.2 listenv_0.8.0 R6_2.5.1 [25] parallelly_1.30.0 fansi_1.0.2 tzdb_0.2.0 magrittr_2.0.2 codetools_0.2-18 ellipsis_0.3.2 units_0.8-0 assertthat_0.2.1 [33] utf8_1.2.2 KernSmooth_2.23-20 stringi_1.7.6 proxy_0.4-26 crayon_1.5.0

The same issue occurred when using other strategies (multisession, multicore, cluster). I do not think this is due to a memory issue because the same code runs well on the same node specifications, but using an older version of R (not installed via conda). Is there something related to using conda?

Any suggestions?

Thanks in advance, Ahmed

orozcoae89 commented 2 years ago

Can you test this code? Is posible that you need the cl object, with follow:

workers <-rep("localhost",4)
cl <- makeClusterPSOCK(workers, revtunnel=TRUE, outfile="")
# starting worker pid=134xx on localhost:11xxx at 13:46:34.292
# starting worker pid=134xx on localhost:11xxx at 13:46:34.580
# starting worker pid=134xx on localhost:11xxx at 13:46:34.817
# starting worker pid=134xx on localhost:11xxx at 13:46:35.050
plan <- plan(list(future::tweak(cluster, workers=workers), multisession))

If you dont't solve send us the short example for replicate and help you.


elgabbas commented 2 years ago


Sometimes, I also receive this error

Fatal error: couldn't allocate memory for pointer stack Fatal error: couldn't allocate memory for pointer stack Error: cannot allocate vector of size 9 Kb

caught segfault address (nil), cause 'memory not mapped' An irrecoverable exception occurred. R is aborting now ...

caught segfault address (nil), cause 'memory not mapped' An irrecoverable exception occurred. R is aborting now ...

caught segfault address (nil), cause 'memory not mapped' An irrecoverable exception occurred. R is aborting now ...

caught segfault address (nil), cause 'memory not mapped' An irrecoverable exception occurred. R is aborting now ...

caught segfault address (nil), cause 'memory not mapped' Error: memory exhausted (limit reached?) Execution halted Error: memory exhausted (limit reached?) Execution halted Warning message: system call failed: Cannot allocate memory

caught segfault address 0x5555559b9568, cause 'invalid permissions' An irrecoverable exception occurred. R is aborting now ...

Thanks, Ahmed

elgabbas commented 2 years ago

Thanks @orozcoae89 for your reply. Here is the results of the script you provided.

workers <-rep("localhost",4) cl <- makeClusterPSOCK(workers, revtunnel=TRUE, outfile="")

starting worker pid=175624 on localhost:11798 at 13:59:27.898 starting worker pid=175627 on localhost:11798 at 13:59:27.898 starting worker pid=175625 on localhost:11798 at 13:59:27.899 starting worker pid=175626 on localhost:11798 at 13:59:27.907

starting worker pid=134xx on localhost:11xxx at 13:46:34.292

starting worker pid=134xx on localhost:11xxx at 13:46:34.580

starting worker pid=134xx on localhost:11xxx at 13:46:34.817

starting worker pid=134xx on localhost:11xxx at 13:46:35.050

plan <- plan(list(future::tweak(cluster, workers=workers), multisession)) plan

List of future strategies:

  1. sequential:
    • args: function (..., envir = parent.frame())
    • tweaked: FALSE
    • call: NULL

I will implement my code then and see if it works.

Cheers, Ahmed

orozcoae89 commented 2 years ago

This error is common when you don't have set the connections with remote machines. Can you give us more details about your code for help you?


elgabbas commented 2 years ago

After running my script, I receive similar error:

Error: Problem with mutate() column Preds. ℹ Preds = future_map(...). ✖ ClusterFuture () failed to call gassign() on cluster RichSOCKnode #3 (PID 175879 on ‘localhost’). The reason reported was ‘error writing to connection’. Post-mortem diagnostic: The total size of the 15 globals exported is 72.55 MiB. The three largest globals are ‘...furrr_chunk_args’ (71.76 MiB of class ‘list’), ‘Models_WS’ (374.44 KiB of class ‘list’) and ‘predictMaxEnt’ (331.60 KiB of class ‘function’)

My code is simple. I am using future_map function from the furrr package to mutate a new column. However, the data is huge and consists of ~1M observations. Nevertheless, the same script worked nicely using the RStudio server on the HPC (same specifications), but now I need to use the same script over different datasets and therefore need to batch the jobs using different R runs .

Cheers, Ahmed

orozcoae89 commented 2 years ago

Well, I think that you could improve your code, because this error is that you have a excedence in the globals fixed by default. More details: https://www.rdocumentation.org/packages/future/versions/1.24.0/topics/future.options. Prove this solution: options(future.globals.maxSize = ??)