Closed snikumbh closed 2 years ago
I get that the error is thrown by the parallel::clusterApplyLB
when fork=FALSE, shared=TRUE
with basiliskStart
.
When I specify only env
in basiliskStart
, no error is thrown but the complete things just hangs, with no progress seen.
(Apologies for the late reply.)
Hm. I've never tried to run basilisk in parallel, but I would have hoped it would have worked off the bat.
From looking at your code, you seem to be re-using the main process's proc
across all the child processes in cl
(I assume, if you're passing in proc
). This is unlikely to work - each child contains its own (virtual) copy of the Python runtime, so any attempt to reference the parent's runtime will fail or do otherwise strange things. Indeed, whenever basilisk decides that it needs to run its code in a new process, it has to re-initialize the Python runtime in the child before actually doing anything.
If my diagnosis is correct, you should be able to fix this by moving/repeating the proc
creation inside the children. Also make sure that you're only passing pure R objects in/out of the children - no reticulate objects, as these will just refer to meaningless memory addresses when they are moved out of the process in which they were generated.
I think I'm having a similar issue, but with BiocParallel. I think my code is configured as @LTLA recommended (basiliskStart
, basiliskRun
, and basiliskStop
are all inside the function called by bplapply
) but when I set the BiocParallelParemeter to bpparam()
(a MulticoreParam
), I get this error:
Error: BiocParallel errors
1 remote errors, element index: 2
0 unevaluated and other errors
first remote error:
Error in serverSocket(p): creation of server socket failed: port 11804 cannot be opened
Code to reproduce:
devtools::install_github('kstreet13/VDJdive')
library(VDJdive)
data('contigs')
x <- clonoStats(contigs, BPPARAM = bpparam())
However, it works as expected with BPPARAM = SerialParam()
. Even more confusingly, after running it with SerialParam()
, re-running with bpparam()
actually works. And according to the results on our GitHub Actions workflow, it looks like this is only an issue on Mac, so is it somehow related to the "difficulties with the generation of separate processes" from the vignette? And if so, is the suggested workaround suitable for use in a package?
This is probably caused by the use of ports to transfer environment variables after activation of the Conda environment. I suppose that, on a Mac, the forked processes try to grab the same port at the same time, resulting in the observed error, e.g.,
serverSocket(p=100000)
## A connection with
## description "localhost"
## class "servsockconn"
## mode "a+"
## text "text"
## opened "opened"
## can read "yes"
## can write "yes"
serverSocket(p=100000)
## Error in serverSocket(p = 1e+05) :
## creation of server socket failed: port 100000 cannot be opened
Not sure why this doesn't happen on Ubuntu, but oh well. (The error in your actions log doesn't seem related, I just see a 404 from failing to set up R.)
Anyway, try installing LTLA/basilisk.utils#5 and see if it makes a difference.
Ah sorry, I missed that the GHA error was something different. But yes, that seems to have fixed it! Thanks very much! Will that version be in the next Bioconductor release?
Yes, just pushed to BioC-devel.
Hi @LTLA ,
As part of another R package, I am trying to use basilisk and run some Python code via reticulate. The package vignette and example were useful in setting it up. I can successfully run a my task in serial. When I try to run the whole code chunk in parallel, where the Python snippet inside basiliskRun is run on multiple nodes in the cluster among much other functionality, I get the error
The structure is somewhat like shown below. The same runs successfully when run serially, but throws an error when run in parallel.
Perhaps, there is something straight forward that I may be missing. I tried playing around with
fork
andshared
params inbasiliskRun
but hasn't helped.Any help is appreciated. Thanks in advance.