Open HenrikBengtsson opened 1 year ago
Buried in the help page ?SnowParam is this note:
NOTE: The \code{PSOCK} cluster from the \code{parallel} package does not
support cluster options \code{scriptdir} and \code{useRscript}. \code{PSOCK}
is not supported because these options are needed to re-direct to an
alternate worker script located in BiocParallel.
But naive testing suggests this no longer seems to be the case (either because of changes in parallel or BiocParallel) so I have started a 'PSOCK' branch.
Is there an easy way to generate the socket connection error?
Buried in the help page ?SnowParam is this note:
NOTE: The \code{PSOCK} cluster from the \code{parallel} package does not support cluster options \code{scriptdir} and \code{useRscript}. \code{PSOCK} is not supported because these options are needed to re-direct to an alternate worker script located in BiocParallel.
But naive testing suggests this no longer seems to be the case (either because of changes in parallel or BiocParallel) ...
I missed that note. I don't think I've ever seen argument scriptdir
or useRscript
in the parallel package. They don't appear if one searches https://hughjonesd.shinyapps.io/rcheology/.
Looking at snow, it looks like scriptdir
is used to point to the R script that runs the parallel workers. If so, then that's handled by parallel without scripts using an internal function, e.g.
'/path/to/lib/R/bin/Rscript' --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'parallel:::.workRSOCK()' MASTER=localhost PORT=11312 OUT=/dev/null TIMEOUT=2592000 XDR=FALSE SETUPTIMEOUT=120 SETUPSTRATEGY=sequential
... so I have started a 'PSOCK' branch.
Excellent.
Is there an easy way to generate the socket connection error?
I don't think so. It's a race condition that appears when many R processes try to create a cluster using the same port. Give that the default is randomizing a port from 11000:11999, it only happens once in a while, but if you check enough things in parallel you end up with it often enough for it to add friction. Before R 4.0.0, I did see it once in a while happening to the future package on the CRAN servers, because I do tons of testing there. It disappeared at the next round of checks.
BTW, I'm not sure, but I also think the race condition could also happen to launch parallel workers in one R CMD check
and another one would actually connect to those workers. If the latter was faster enough, it could completely successfully, but if the original check terminated before, then it would shut down those workers, breaking the check for the other package. The SOCK/PSOCK protocol does not protect a non-owning R process from connecting, including those ran by other users. This is actually a security issue on multi-user servers, but that's another story.
Yes, parallel's implementation doesn't allow customization of the worker startup script, whereas snow (& therefore SOCK, MPI, FORK) can (and are, by BiocParallel) be customized.
Looking a little more deeply makes it seem likely that BiocParallel's log = TRUE
option would be affected, which you can see in the 'Log messages' and 'stdout' sections
> res <- bplapply(1:2, message, BPPARAM = SnowParam(type = "PSOCK", log = TRUE))
############### LOG OUTPUT ###############
Task: 1
Node: 6
Timestamp: 2022-11-21 17:34:29.946785
Success: TRUE
Task duration:
user system elapsed
0.186 0.007 0.198
Memory used:
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 1209511 64.6 2057557 109.9 NA 2057557 109.9
Vcells 2873984 22.0 8388608 64.0 32768 8388267 64.0
Log messages:
stderr and stdout:
...
versus
res <- bplapply(1:2, message, BPPARAM = SnowParam(type = "SOCK", log = TRUE))
############### LOG OUTPUT ###############
Task: 2
Node: 5
Timestamp: 2022-11-21 17:34:36.612367
Success: TRUE
Task duration:
user system elapsed
0.090 0.006 0.109
Memory used:
used (Mb) gc trigger (Mb) limit (Mb) max used (Mb)
Ncells 1209512 64.6 2057557 109.9 NA 2057557 109.9
Vcells 2873982 22.0 8388608 64.0 32768 8388267 64.0
Log messages:
INFO [2022-11-21 17:34:36] loading futile.logger package
stderr and stdout:
2
############### LOG OUTPUT ###############
Yes, parallel's implementation doesn't allow customization of the worker startup script, whereas snow (& therefore SOCK, MPI, FORK) can (and are, by BiocParallel) be customized.
You can probably use rscript_args
to customize the startup process of each worker, e.g. rscript_args = c("-e", shQuote('setwd("/path/to")'))
.
FWIW, I've made some of these things easier and more robust in parallelly::makeClusterPSOCK()
.
Background
SnowParam()
supportstype = "SOCK"
(default),type = "MPI"
, andtype = "FORK"
. The former two stems from the days of snow package and the latter was introduced with the parallel package. Thetype
argument is passed toparallel::makeCluster()
as-is;Wish
Please add support also for
type = "PSOCK"
, which is the default forparallel::makeCluster()
[since day one back in 2014, I think]. It looks like it would be quite straightforward to do this.Why add this? Because, PSOCK clusters have undergone lots of improvements since snow was incorporated into parallel. For example, in R (>= 4.0.0), the nodes ("workers") of PSOCK cluster is set up in parallel, instead of sequentially. This makes the setup much faster, e,g.
Source: https://www.jottr.org/2021/06/10/parallelly-1.26.0/
In addition, this parallel setup strategy avoids port clashes that we saw in parallel (< 4.0.0), and still in snow (since it's deprecated and not improved on), e.g.
FYI, I haven't seen those type of errors since R (< 4.0.0), except from revdep checking packages relying on snow. More recently while revdep checking Bioconductor package DMCFB that uses
SnowParam
in it's package tests.