DavisVaughan / furrr

Apply Mapping Functions in Parallel using Futures
https://furrr.futureverse.org/
Other
698 stars 40 forks source link

future_walk work in parallel in script function, but not in R package function #236

Closed jinyancool closed 1 year ago

jinyancool commented 2 years ago

Dear developer, When I wrote an R scirpt function with future_walk, it can work in parallel, but if I wrap this R function in R package, it works in sequential.

future::availableCores() system 160 ################## R script function, it works fine. mutect2 <- function(config, interval_dir){ intervals <- dir_ls(interval_dir, glob = "*-scattered.interval_list") oplan <- plan(multisession, workers = 60) on.exit(plan(oplan), add = TRUE) future_walk(intervals, ~ mutect2_wes_one(config, .x)) }

run: mutect2(config, interval_dir) is fine.

################## R package mypkg function, and call this function outside R package, e.g, mypkg::mutect2. It does not work as expected.

future::availableCores() system 160 mutect2 <- function(config, interval_dir){ intervals <- dir_ls(interval_dir, glob = "*-scattered.interval_list") oplan <- plan(multisession, workers = 60) on.exit(plan(oplan), add = TRUE) future_walk(intervals, ~ mutect2_wes_one(config, .x)) }

run: mypkg::mutect2(config, interval_dir) does not work as expected.

jinyancool commented 2 years ago

packageVersion("furrr") [1] '0.3.0.9000'

packageVersion("future") [1] '1.25.0'

sessionInfo() R version 4.1.1 (2021-08-10) Platform: x86_64-conda-linux-gnu (64-bit) Running under: CentOS Linux 8

Matrix products: default BLAS/LAPACK: /cluster/apps/anaconda3/2020.02/envs/R-4.1.1/lib/libopenblasp-r0.3.17.so

locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8 LC_NAME=C
[9] LC_ADDRESS=C LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages: [1] stats graphics grDevices utils datasets methods base

other attached packages: [1] furrr_0.3.0.9000 future_1.25.0 jhtools_1.0.0
[4] glue_1.6.2 jhuanglabwgs_1.0.0 optparse_1.7.1
[7] configr_0.3.5 futile.logger_1.4.3 pak_0.3.0
[10] devtools_2.4.3 usethis_2.1.5 rvcheck_0.2.1
[13] forcats_0.5.1 stringr_1.4.0 dplyr_1.0.9
[16] purrr_0.3.4 readr_2.1.2 tidyr_1.2.0
[19] tibble_3.1.7 ggplot2_3.3.6 tidyverse_1.3.1
[22] fs_1.5.2 wget_0.0.1

loaded via a namespace (and not attached): [1] utf8_1.2.2 tidyselect_1.1.2
[3] htmlwidgets_1.5.4 RSQLite_2.2.14
[5] AnnotationDbi_1.54.1 grid_4.1.1
[7] BiocParallel_1.28.3 munsell_0.5.0
[9] codetools_0.2-18 withr_2.5.0
[11] colorspace_2.0-3 Biobase_2.54.0
[13] filelock_1.0.2 ggfortify_0.4.14
[15] knitr_1.39 rstudioapi_0.13
[17] stats4_4.1.1 ggsignif_0.6.3
[19] listenv_0.8.0 MatrixGenerics_1.6.0
[21] tximport_1.20.0 GenomeInfoDbData_1.2.7
[23] ini_0.3.1 bit64_4.0.5
[25] rprojroot_2.0.3 parallelly_1.31.1
[27] vctrs_0.4.1 generics_0.1.2
[29] xfun_0.30 lambda.r_1.2.4
[31] biovizBase_1.40.0 BiocFileCache_2.2.1
[33] regioneR_1.24.0 R6_2.5.1
[35] GenomeInfoDb_1.30.1 AnnotationFilter_1.16.0
[37] bitops_1.0-7 cachem_1.0.6
[39] DelayedArray_0.20.0 assertthat_0.2.1
[41] BiocIO_1.2.0 scales_1.2.0
[43] nnet_7.3-17 gtable_0.3.0
[45] globals_0.15.0 processx_3.5.3
[47] ensembldb_2.16.4 rlang_1.0.2
[49] splines_4.1.1 lazyeval_0.2.2
[51] rtracklayer_1.52.1 rstatix_0.7.0
[53] dichromat_2.0-0.1 checkmate_2.1.0
[55] broom_0.8.0 BiocManager_1.30.17
[57] yaml_2.3.5 abind_1.4-5
[59] modelr_0.1.8 GenomicFeatures_1.44.2
[61] backports_1.4.1 Hmisc_4.7-0
[63] tools_4.1.1 ellipsis_0.3.2
[65] gplots_3.1.3 RColorBrewer_1.1-3
[67] karyoploteR_1.18.0 DNAcopy_1.66.0
[69] BiocGenerics_0.40.0 sessioninfo_1.2.2
[71] Rcpp_1.0.8.3 base64enc_0.1-3
[73] progress_1.2.2 zlibbioc_1.40.0
[75] RCurl_1.98-1.6 ps_1.7.0
[77] prettyunits_1.1.1 rpart_4.1.16
[79] ggpubr_0.4.0 RcppTOML_0.1.7
[81] S4Vectors_0.32.4 cluster_2.1.3
[83] SummarizedExperiment_1.24.0 haven_2.5.0
[85] magrittr_2.0.3 data.table_1.14.2
[87] futile.options_1.0.1 openxlsx_4.2.5
[89] reprex_2.0.1 ProtGenerics_1.24.0
[91] matrixStats_0.62.0 pkgload_1.2.4
[93] hms_1.1.1 patchwork_1.1.1
[95] XML_3.99-0.9 jpeg_0.1-9
[97] readxl_1.4.0 IRanges_2.28.0
[99] gridExtra_2.3 testthat_3.1.4
[101] compiler_4.1.1 biomaRt_2.48.3
[103] KernSmooth_2.23-20 crayon_1.5.1
[105] htmltools_0.5.2 tzdb_0.3.0
[107] Formula_1.2-4 lubridate_1.8.0
[109] DBI_1.1.2 formatR_1.12
[111] corrplot_0.92 dbplyr_2.1.1
[113] rappdirs_0.3.3 Matrix_1.4-1
[115] getopt_1.20.3 car_3.0-13
[117] brio_1.1.3 cli_3.3.0
[119] gdata_2.18.0 parallel_4.1.1
[121] GenomicRanges_1.46.1 pkgconfig_2.0.3
[123] GenomicAlignments_1.28.0 foreign_0.8-82
[125] xml2_1.3.3 XVector_0.34.0
[127] rvest_1.0.2 yulab.utils_0.0.4
[129] bezier_1.1.2 VariantAnnotation_1.38.0
[131] callr_3.7.0 digest_0.6.29
[133] Biostrings_2.60.2 cellranger_1.1.0
[135] htmlTable_2.4.0 restfulr_0.0.13
[137] curl_4.3.2 Rsamtools_2.8.0
[139] gtools_3.9.2 rjson_0.2.21
[141] lifecycle_1.0.1 jsonlite_1.8.0
[143] carData_3.0-5 desc_1.4.1
[145] limma_3.50.3 BSgenome_1.60.0
[147] fansi_1.0.3 pillar_1.7.0
[149] lattice_0.20-45 survival_3.3-1
[151] KEGGREST_1.32.0 fastmap_1.1.0
[153] httr_1.4.3 pkgbuild_1.3.1
[155] remotes_2.4.2 conflicted_1.1.0
[157] zip_2.2.0 bamsignals_1.24.0
[159] png_0.1-7 bit_4.0.4
[161] stringi_1.7.6 blob_1.2.3
[163] org.Hs.eg.db_3.13.0 latticeExtra_0.6-29
[165] caTools_1.18.2 memoise_2.0.1

jinyancool commented 2 years ago

https://cran.r-project.org/web/packages/future/vignettes/future-7-for-package-developers.html

The document at the above URL does not help.

DavisVaughan commented 2 years ago

It does not work as expected

You haven't explained what the actual problem is. Can you please provide some output for the failing case?

jinyancool commented 2 years ago

It does not fail. Just does not work as expected. Calling with R scirpt function, it can use 60 workers in parallel. Calling with R package function mypkg::mutect2(config, interval_dir), it only uses two workers in sequential. I can repeat this problem stably. I have tried .env_globals = rlang::global_env() or .env_globals = parent.frame(). It does not help.

future_walk(intervals, ~ mutect2_wes_one(config, .x), .env_globals = rlang::global_env())

jinyancool commented 2 years ago

I just think it is the function calling environment that caused this problem.

DavisVaughan commented 2 years ago

So the problem is that it is running sequentially when called through the package, even though you set plan(multisession) in the package function? But if you don't put it in a package then it correctly runs in parallel?

That sounds strange to me.

It is unlikely to be a function environment issue if that is the case.

Can you point me to a repo on GitHub that has this package in it? Or can you create a repo on GitHub that demonstrates this problem for you? I am unlikely to be able to help you otherwise

DavisVaughan commented 2 years ago

By the way, setting plan() inside a function is typically not best practice. plan() should really only be called at the user level. Users should control whether or not the function runs in parallel, and the default should be to run sequentially.

jinyancool commented 2 years ago

It is right. The problem is that it is running sequentially when called through the package, even though I set plan(multisession) in the package function. But if I don't put it in a package then it correctly runs in parallel. I will try to upload the package to github. It is better that you have gatk installed.

jinyancool commented 2 years ago

If I put the plan() outside the R package, it still cannot run in parallel.

jinyancool commented 2 years ago

I have made an R package at:

https://github.com/jinyancool/fakepkg

The function is:

test_furrr <- function(){ intervals <- seq(1,60) oplan <- plan(multisession, workers = 60) on.exit(plan(oplan), add = TRUE) future_walk(intervals, ~ run_fun(.x)) }

You will find run: fakepkg::test_furrr() and paste test_furrr() script in terminal, then run directly are quite different.

jinyancool commented 2 years ago

library(tictoc) tic() test_furrr() toc() 25.259 sec elapsed

tic() fakepkg::test_furrr() toc() 157.118 sec elapsed

jinyancool commented 2 years ago

Can we solve this issue now? Thanks.

DavisVaughan commented 1 year ago

Closing due to inability to reproduce