Bioconductor / basilisk

Clone of the Bioconductor repository for the basilisk package.
https://bioconductor.org/packages/devel/bioc/html/basilisk.html
GNU General Public License v3.0
27 stars 14 forks source link

Error: invalid connection #13

Closed snikumbh closed 2 years ago

snikumbh commented 2 years ago

Hi @LTLA ,

As part of another R package, I am trying to use basilisk and run some Python code via reticulate. The package vignette and example were useful in setting it up. I can successfully run a my task in serial. When I try to run the whole code chunk in parallel, where the Python snippet inside basiliskRun is run on multiple nodes in the cluster among much other functionality, I get the error

  100 nodes produced errors; first error: invalid connection

The structure is somewhat like shown below. The same runs successfully when run serially, but throws an error when run in parallel.

main_func(){

if(parallelize) {
        cl <- parallel::makeCluster(cores, type = "FORK")
        parallel::setDefaultCluster(cl)
}

#setup_basilisk_proc: 
proc <- basiliskStart(<envname>)

# - Iterate for N times 
# - Many lines of code serving various steps. 
# - The crux of which is a single function 
#    that runs reticulated Python 
#    from inside basiliskRun 
#    (similar to example func in section 
#    3.4, basilisk vignette)
# 
# - loop ends

if(parallelize) parallel::stopCluster(cl)
basilisk::basiliskStop(proc)

}

Perhaps, there is something straight forward that I may be missing. I tried playing around with fork and shared params in basiliskRun but hasn't helped.

Any help is appreciated. Thanks in advance.

> sessionInfo()
R version 4.1.1 (2021-08-10)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux bookworm/sid

Matrix products: default
BLAS:   /usr/lib/x86_64-linux-gnu/openblas-pthread/libblas.so.3
LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.18.so

locale:
 [1] LC_CTYPE=en_GB.UTF-8       LC_NUMERIC=C               LC_TIME=en_DK.UTF-8       
 [4] LC_COLLATE=C               LC_MONETARY=en_DK.UTF-8    LC_MESSAGES=en_DK.UTF-8   
 [7] LC_PAPER=en_DK.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
[10] LC_TELEPHONE=C             LC_MEASUREMENT=en_DK.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] seqArchR_0.99.0     Biostrings_2.62.0   GenomeInfoDb_1.30.0 XVector_0.34.0      IRanges_2.28.0     
 [6] S4Vectors_0.32.0    BiocGenerics_0.40.0 shiny_1.7.1         reshape2_1.4.4      ggplot2_3.3.5      
[11] ggseqlogo_0.1       testthat_3.1.0     

loaded via a namespace (and not attached):
  [1] utf8_1.2.2                  reticulate_1.22             R.utils_2.11.0             
  [4] RUnit_0.4.32                tidyselect_1.1.1            poweRlaw_0.70.6            
  [7] RSQLite_2.2.8               AnnotationDbi_1.56.0        htmlwidgets_1.5.4          
 [10] grid_4.1.1                  BiocParallel_1.28.0         devtools_2.4.2             
 [13] munsell_0.5.0               codetools_0.2-18            withr_2.4.2                
 [16] colorspace_2.0-2            Biobase_2.54.0              filelock_1.0.2             
 [19] knitr_1.36                  rstudioapi_0.13             robustbase_0.93-9          
 [22] MatrixGenerics_1.6.0        rcmdcheck_1.4.0             labeling_0.4.2             
 [25] optparse_1.7.1              GenomeInfoDbData_1.2.7      cvTools_0.3.2              
 [28] bit64_4.0.5                 farver_2.1.0                rprojroot_2.0.2            
 [31] basilisk_1.6.0              vctrs_0.3.8                 generics_0.1.1             
 [34] xfun_0.27                   diptest_0.76-0              R6_2.5.1                   
 [37] flexmix_2.3-17              bitops_1.0-7                cachem_1.0.6               
 [40] DelayedArray_0.20.0         assertthat_0.2.1            promises_1.2.0.1           
 [43] BiocIO_1.4.0                scales_1.1.1                nnet_7.3-16                
 [46] gtable_0.3.0                biocViews_1.62.1            processx_3.5.2             
 [49] seqLogo_1.60.0              rlang_0.4.12                rtracklayer_1.54.0         
 [52] BiocManager_1.30.16         yaml_2.2.1                  httpuv_1.6.3               
 [55] RBGL_1.70.0                 tools_4.1.1                 usethis_2.1.3              
 [58] xopen_1.0.0                 ellipsis_0.3.2              jquerylib_0.1.4            
 [61] sessioninfo_1.1.1           Rcpp_1.0.7                  plyr_1.8.6                 
 [64] zlibbioc_1.40.0             vdiffr_0.4.0                purrr_0.3.4                
 [67] RCurl_1.98-1.5              ps_1.6.0                    basilisk.utils_1.6.0       
 [70] prettyunits_1.1.1           cowplot_1.1.1               SummarizedExperiment_1.24.0
 [73] cluster_2.1.2               fs_1.5.0                    here_1.0.1                 
 [76] magrittr_2.0.1              matrixStats_0.61.0          pkgload_1.2.3              
 [79] hms_1.1.1                   mime_0.12                   evaluate_0.14              
 [82] xtable_1.8-4                XML_3.99-0.8                mclust_5.4.7               
 [85] compiler_4.1.1              tibble_3.1.5                crayon_1.4.1               
 [88] R.oo_1.24.0                 htmltools_0.5.2             later_1.3.0                
 [91] tzdb_0.1.2                  TFBSTools_1.32.0            DBI_1.1.1                  
 [94] MASS_7.3-54                 fpc_2.2-9                   rappdirs_0.3.3             
 [97] Matrix_1.3-4                getopt_1.20.3               readr_2.0.2                
[100] cli_3.0.1                   R.methodsS3_1.8.1           parallel_4.1.1             
[103] GenomicRanges_1.46.0        pkgconfig_2.0.3             GenomicAlignments_1.30.0   
[106] dir.expiry_1.2.0            TFMPvalue_0.0.8             hopach_2.54.0              
[109] xml2_1.3.2                  roxygen2_7.1.2              annotate_1.72.0            
[112] bslib_0.3.1                 DirichletMultinomial_1.36.0 stringdist_0.9.8           
[115] BiocCheck_1.30.0            stringr_1.4.0               callr_3.7.0                
[118] digest_0.6.28               pracma_2.3.3                CNEr_1.30.0                
[121] graph_1.72.0                rmarkdown_2.11              restfulr_0.0.13            
[124] curl_4.3.2                  kernlab_0.9-29              Rsamtools_2.10.0           
[127] gtools_3.9.2                commonmark_1.7              modeltools_0.2-23          
[130] rjson_0.2.20                lifecycle_1.0.1             jsonlite_1.7.2             
[133] desc_1.4.0                  BSgenome_1.62.0             fansi_0.5.0                
[136] pillar_1.6.4                lattice_0.20-45             KEGGREST_1.34.0            
[139] fastmap_1.1.0               httr_1.4.2                  DEoptimR_1.0-9             
[142] pkgbuild_1.2.0              GO.db_3.14.0                waldo_0.3.1                
[145] glue_1.4.2                  remotes_2.4.1               png_0.1-7                  
[148] prabclus_2.3-2              bit_4.0.4                   class_7.3-19               
[151] stringi_1.7.5               sass_0.4.0                  blob_1.2.2                 
[154] caTools_1.18.2              memoise_2.0.0               dplyr_1.0.7                
snikumbh commented 2 years ago

I get that the error is thrown by the parallel::clusterApplyLB when fork=FALSE, shared=TRUE with basiliskStart. When I specify only env in basiliskStart, no error is thrown but the complete things just hangs, with no progress seen.

LTLA commented 2 years ago

(Apologies for the late reply.)

Hm. I've never tried to run basilisk in parallel, but I would have hoped it would have worked off the bat.

From looking at your code, you seem to be re-using the main process's proc across all the child processes in cl (I assume, if you're passing in proc). This is unlikely to work - each child contains its own (virtual) copy of the Python runtime, so any attempt to reference the parent's runtime will fail or do otherwise strange things. Indeed, whenever basilisk decides that it needs to run its code in a new process, it has to re-initialize the Python runtime in the child before actually doing anything.

If my diagnosis is correct, you should be able to fix this by moving/repeating the proc creation inside the children. Also make sure that you're only passing pure R objects in/out of the children - no reticulate objects, as these will just refer to meaningless memory addresses when they are moved out of the process in which they were generated.

kstreet13 commented 2 years ago

I think I'm having a similar issue, but with BiocParallel. I think my code is configured as @LTLA recommended (basiliskStart, basiliskRun, and basiliskStop are all inside the function called by bplapply) but when I set the BiocParallelParemeter to bpparam() (a MulticoreParam), I get this error:

Error: BiocParallel errors
1 remote errors, element index: 2
0 unevaluated and other errors
first remote error:
Error in serverSocket(p): creation of server socket failed: port 11804 cannot be opened
Full traceback: ``` Error: BiocParallel errors 1 remote errors, element index: 2 0 unevaluated and other errors first remote error: Error in serverSocket(p): creation of server socket failed: port 11768 cannot be opened 7. stop(.error_bplist(res)) 6. .bpinit(manager = manager, X = X, FUN = FUN, ARGS = ARGS, BPPARAM = BPPARAM, BPOPTIONS = BPOPTIONS, BPREDO = BPREDO) 5. bplapply(levels(grpVar), function(lv) { .EM_sample(contigs[which(grpVar == lv)], type = type, lang = lang, thresh = thresh, iter.max = iter.max) }, BPPARAM = BPPARAM) 4. bplapply(levels(grpVar), function(lv) { .EM_sample(contigs[which(grpVar == lv)], type = type, lang = lang, thresh = thresh, iter.max = iter.max) }, BPPARAM = BPPARAM) at clonoStats.R#159 3. .local(x, ...) 2. clonoStats(contigs) at clonoStats.R#10 1. clonoStats(contigs) ```

Code to reproduce:

devtools::install_github('kstreet13/VDJdive')
library(VDJdive)
data('contigs')
x <- clonoStats(contigs, BPPARAM = bpparam())

However, it works as expected with BPPARAM = SerialParam(). Even more confusingly, after running it with SerialParam(), re-running with bpparam() actually works. And according to the results on our GitHub Actions workflow, it looks like this is only an issue on Mac, so is it somehow related to the "difficulties with the generation of separate processes" from the vignette? And if so, is the suggested workaround suitable for use in a package?

LTLA commented 2 years ago

This is probably caused by the use of ports to transfer environment variables after activation of the Conda environment. I suppose that, on a Mac, the forked processes try to grab the same port at the same time, resulting in the observed error, e.g.,

serverSocket(p=100000)
## A connection with                          
## description "localhost"   
## class       "servsockconn"
## mode        "a+"          
## text        "text"        
## opened      "opened"      
## can read    "yes"         
## can write   "yes"         

serverSocket(p=100000)
## Error in serverSocket(p = 1e+05) : 
##   creation of server socket failed: port 100000 cannot be opened

Not sure why this doesn't happen on Ubuntu, but oh well. (The error in your actions log doesn't seem related, I just see a 404 from failing to set up R.)

Anyway, try installing LTLA/basilisk.utils#5 and see if it makes a difference.

kstreet13 commented 2 years ago

Ah sorry, I missed that the GHA error was something different. But yes, that seems to have fixed it! Thanks very much! Will that version be in the next Bioconductor release?

LTLA commented 2 years ago

Yes, just pushed to BioC-devel.