Open HenrikBengtsson opened 7 years ago
When setting up multiple PSOCK R worker session on the same machine, they will all try to connect back to the same port, e.g.
> trace(system, tracer = quote(message(command)), print = FALSE)
Tracing function "system" in package "base"
[1] "system"
> library("parallel")
> cl <- makeCluster(rep("remote.myserver.org", 2L), user=NULL, master="local.mymachine.org", homogeneous=FALSE)
ssh remote.myserver.org "Rscript --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'parallel:::.slaveRSOCK()' MASTER=local.mymachine.org PORT=11120 OUT='/dev/null' TIMEOUT=2592000 XDR=TRUE"
ssh remote.myserver.org "Rscript --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'parallel:::.slaveRSOCK()' MASTER=local.mymachine.org PORT=11120 OUT='/dev/null' TIMEOUT=2592000 XDR=TRUE"
Note how both R worker processes is set up to connect back to local.mymachine.org:11900
.
Now, this will currently not work with reverse-SSH-tunnel patch, because in that case we'll get that both workers (running on the same machine) will try to set up reverse SSH tunnels on the same local port:
> cl <- parallel::makeCluster(rep("remote.myserver.org", 2L), user=NULL, revtunnel=TRUE, master="localhost", homogeneous=FALSE)
ssh -R 11456:localhost:11456 remote.myserver.org "Rscript --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11456 OUT='/dev/null' TIMEOUT=2592000 XDR=TRUE"
ssh -R 11456:localhost:11456 remote.myserver.org "Rscript --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11456 OUT='/dev/null' TIMEOUT=2592000 XDR=TRUE"
Warning: remote port forwarding failed for listen port 11456
It's only the first one that will succeed, but any of the following ones (here one) will fail because the local port is already taken.
The solution is to make sure each R worker uses a unique local port local_port
in SSH option -R local_port:localhost:11456
. This could done by making local_port = port + (rank - 1L)
as in the following updated patch:
Index: src/library/parallel/R/snow.R
===================================================================
--- src/library/parallel/R/snow.R (revision 71456)
+++ src/library/parallel/R/snow.R (working copy)
@@ -97,8 +97,10 @@
outfile = "/dev/null",
rscript = rscript,
rscript_args = character(),
- user = Sys.i[["user"]],
+ user = NULL,
rshcmd = "ssh",
+ revtunnel = FALSE,
+ rshopts = NULL,
manual = FALSE,
methods = TRUE,
renice = NA_integer_,
Index: src/library/parallel/R/snowSOCK.R
===================================================================
--- src/library/parallel/R/snowSOCK.R (revision 71456)
+++ src/library/parallel/R/snowSOCK.R (working copy)
@@ -39,7 +39,7 @@
## build the local command for starting the worker
env <- paste0("MASTER=", master,
" PORT=", port,
- " OUT=", outfile,
+ " OUT=", shQuote(outfile),
" TIMEOUT=", timeout,
" XDR=", useXDR)
arg <- "parallel:::.slaveRSOCK()"
@@ -71,11 +71,25 @@
if (machine != "localhost") {
## This assumes an ssh-like command
rshcmd <- getClusterOption("rshcmd", options)
+ opts <- NULL
+
+ ## Specify '-l user'?
user <- getClusterOption("user", options)
+ if (!is.null(user)) opts <- c(opts, paste("-l", user))
+
+ ## Use SSH reverse tunneling?
+ revtunnel <- getClusterOption("revtunnel", options)
+ if (isTRUE(revtunnel))
+ opts <- c(opts, sprintf("-R %d:%s:%d", port + (rank - 1L), master, port))
+
+ ## Additional SSH options?
+ opts <- c(opts, getClusterOption("rshopts", options)
+
## this assume that rshcmd will use a shell, and that is
## the same shell as on the master.
cmd <- shQuote(cmd)
- cmd <- paste(rshcmd, "-l", user, machine, cmd)
+ opts <- paste(opts, collapse = " ")
+ cmd <- paste(rshcmd, opts, machine, cmd)
}
if (.Platform$OS.type == "windows") {
The above local port tweak solves the port clash:
> trace(system, tracer=quote(message(command)), print=FALSE)
Tracing function "system" in package "base"
[1] "system"
> library("parallel")
> cl <- makeCluster(rep("remote.myserver.org", 2), user=NULL, revtunnel=TRUE, master="localhost", homogeneous=FALSE)
ssh -R 11921:localhost:11921 remote.myserver.org "Rscript --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11921 OUT='/dev/null' TIMEOUT=2592000 XDR=TRUE"
ssh -R 11922:localhost:11921 remote.myserver.org "Rscript --default-packages=datasets,utils,grDevices,graphics,stats,methods -e 'parallel:::.slaveRSOCK()' MASTER=localhost PORT=11921 OUT='/dev/null' TIMEOUT=2592000 XDR=TRUE"
> res <- parLapply(cl, 1:3, fun=function(x) x^2)
> unlist(res)
[1] 1 4 9
> stopCluster(cl)
Quick summary
Add support for reverse SSH tunneling (
-R <port>:localhost:<port>
) when setting up PSOCK clusters usingparallel::makeCluster()
. This helps avoid firewall and port forwarding issues that appear when trying to connect to remote machines / clusters.Basically, the proposed patch allows you to connect to remote R machines from anywhere as long as you can ssh directly to the machine.
If you have comments, suggestions, ideas and / or critique, please comment below. The plan is to collect and summarize feedback here, then to bring it up on R-devel, and eventually submit the patch to https://bugs.r-project.org/.
Background
The
makeCluster()
function of the parallel package can be used to run on a remote cluster. This can typically be done as:(*) If
port
is not specified, a random port in [11000,11999] is used.By default this results in a connection to
remote.myserver.org
over SSH via an internalsystem()
call like:Issue
Now, in order for this to remote connection to be successfully set up, it is not only necessary for the
ssh -l johndoe remote.myserver.org
connection to work, but also forremote.myserver.org
to be able to open a socket in the reverse direction back to our local machine atlocal.mymachine.org
on port11001
. The latter part is problematic because it requires us to open up any local firewalls to allow for incoming connection to port11001
(or anyone in range [11000,11999]). Even worse is when we're behind a local router, e.g. if we're on a notebook connected via a WiFi router. In such cases we also have to configure the router forward ("port forwarding") incoming connections to port11001
(or anyone in range [11000,11999]) to our notebook. If two or more users try to do the same, things become complicated. This not only requires you to have access privileges to configure the local router but you most likely also have to configure the DHCP to use static IP for your notebook and for everyone else who wish to do the same. You also have to make sure you're not trying to use the same ports.Solution
In SSH there is a concept called reverse tunneling, which basically makes it possible to set up a reverse port-to-port connection within the outgoing connection. This way there is no need to worry about the
remote.myserver.org
being able to connect back to your local machine. As long as you can make the outgoing SSH connection, the reverse connection should work out of the box (*).By replacing the above SSH call with
the remote R worker will try open up the reverse connection on port 11001 on
localhost
(== the remote machine). Since reverse tunneling is used, this will be port forwarded to port 11001 on the calling machine (= your local machine).In addition to the above, this also has the advantage of not having to know your public IP address or have dynamic DNS setup.
(*) An exception is when you use SSH tunneling in your outgoing connection to
remote.myserver.org
. In such cases, you might have to use more complex reverse SSH tunneling than proposed here.Suggestion
Add support for reverse SSH tunneling, e.g.
Proposed patch
Here is a patch (
svn diff src/library/parallel
) that:revtunnel
(logical) to control whether reverse SSH tunneling should be used or not (this issue).rcmdopts
(a character string) for adding any command-line options of choice. This gives the user further options on how to set up a reverse SSH tunnel and / or other SSH configurations.user=NULL
(new default), then-l <user>
is skipped. This makes it possible to specify the user name in~/.ssh/config
(See Issue #31 for full discussion)