Open HenrikBengtsson opened 8 years ago
Is it safe to drop defaultClusterOptions$user
/ set it to NULL
?
There are no other usages of defaultClusterOptions$user
in core R that I'm aware of. I grep:ed all of the source code as below and I could only find that it's used within parallel:::newPSOCKnode()
.
$ grep -r --include='*.R' -v -E '^[[:space:]]*#.*' | grep -E "[^#]*([\$(\"' ]+)user([)\"' ]+|$)"
utils/R/edit.R: if(factor.mode != mode(out[[i]])) next # user might have switched mode
utils/R/databrowser.R: si[c("user","nodename","sysname")]})))
utils/R/tar.R: warning(gettextf("invalid uid value replaced by that for user 'nobody'", uid),
utils/R/tar.R: warning(gettextf("invalid gid value replaced by that for user 'nobody'", uid),
utils/R/SweaveDrivers.R: leading <- 1L # How many lines get the user prompt
graphics/R/legend.R: text.width <- max(abs(strwidth(legend, units="user",
graphics/R/legend.R: ymax <- yc * max(1, strheight(legend, units="user", cex=cex)/yc)
graphics/R/legend.R: && (abs(tw <- strwidth(title, units="user", cex=cex) + 0.5*xchar)) > abs(w)) {
graphics/R/plot.R: "user", "inches", "", "", "npc")
graphics/R/plot.R:grconvertX <- function(x, from = "user", to = "user")
graphics/R/plot.R:grconvertY <- function(y, from = "user", to = "user")
graphics/R/pairs.R: l.wid <- strwidth(labels, "user")
graphics/R/strwidth.R: function(s, units = "user", cex = NULL, font = NULL, vfont = NULL,...)
graphics/R/strwidth.R: pmatch(units, c("user", "figure", "inches")),
graphics/R/strwidth.R: function(s, units = "user", cex = NULL, font = NULL, vfont = NULL, ...)
graphics/R/strwidth.R: pmatch(units, c("user", "figure", "inches")),
base/R/time.R: c(gettext("user"), gettext("system"), gettext("elapsed"))
parallel/R/snowSOCK.R: user <- getClusterOption("user", options)
parallel/R/snow.R: user = Sys.i[["user"]],
parallel/R/unix/mclapply.R: warning("all scheduled cores encountered errors in user code")
parallel/R/unix/mclapply.R: "scheduled core %s encountered error in user code, all values of the job will be affected",
parallel/R/unix/mclapply.R: "scheduled cores %s encountered errors in user code, all values of the jobs will be affected"),
tools/R/build.R: user <- Sys.info()["user"]
tools/R/build.R: if(user == "unknown") user <- Sys.getenv("LOGNAME")
tools/R/build.R: user)
methods/R/RClassUtils.R: if(!identical(default, value)) # user supplied default
methods/R/RClassUtils.R: if(!identical(default, value)) # user supplied default
stats/R/spectrum.R: if(!is.null(spans)) # allow user to mistake order of args
As of R-devel (rev 79418; 2020-11-12), parallel::makePSOCKcluster()
now supports disabling SSH user by specifying user=NULL
, e.g.
> cl <- parallel::makePSOCKcluster("example.org") ## defaults to user=Sys.info()[["user"]]
> cl <- parallel::makePSOCKcluster("example.org", user=NULL) ## drops ssh option -l <user>
I'd say, what can be improved here is to have user = NULL
be the default. I argue that it is a backward compatible change.
Currently, parallel:::initDefaultClusterOptions()
sets the default:
user = Sys.i[["user"]],
Now, if we don't specify the command-line option -l user
, the SSH client will use
$USER
I think Sys.info()[["user"]]
is the same as $USER
, which means that defaulting to
user = Sys.i[["user"]],
is not needed. BTW, user
is only used for remote shells.
I've posted https://bugs.r-project.org/bugzilla/show_bug.cgi?id=18042 to make user = NULL
the default.
Quick summary
The PSOCK functionality of the parallel package is currently always passing an
-l <username>
option in thessh
call. If user doesn't specify a username (via argumentuser
), then it will fall back to using the default username (=Sys.info()[["user"]]
). The problem with this is that it overrides any username specifications in a~/.ssh/config
file. I propose a patch to parallel that only passes the-l <username>
option, ifuser
is explictly set. If not specified, it relies onssh
to do the correct thing.Background
Using the parallel package, we can connect to a remote machine (or cluster) by using:
The default is that the connection is set up via an ssh call. To verify that we can connect to the remote machine from our current location, we can use:
(To verify that the remote machine in turn can connect back, which is also required, is a different topic and not relevant to this issue).
Issue
If one uses different usernames locally and remotely, it is convenient to configure the default username for a given server by editing
~/.ssh/config
, e.g.So, if my local username is
hb
, i.e.with the above
~/.ssh/config
file, I no longer have to specify-l henrik
, but I can just do:regardless of my local username. (Without a
~/.ssh/config
file, ssh would fall back to use my local username (as in-l $USER
). I will return to this in my proposal at the end.)Unfortunately, this does not work with
parallel::makeCluster()
. In other words, it is not possible to do just:Troubleshooting
The reason that the username in
~/.ssh/config
is ignored is that the parallel package will override it because it always passes option-l <local username>
tossh
. For example,which matches my local username:
If we dig into the code of the parallel package, we find the following piece of code in
parallel:::newPSOCKnode()
:where
Further inspection shows that
getClusterOption("user", options)
will return the above defaultoptions$user
value if not explicitly specified as an argument tomakeCluster()
(see initial example).The
parallel:::defaultClusterOptions
object is initialized withuser = Sys.info()[["user"]]
as seen inparallel:::initDefaultClusterOptions()
which is called when the parallel package is loaded.Workaround
Since
parallel:::newPSOCKnode()
will always inject-l <username>
it is not clear how to circumvent this. For instance, trying to trick it withparallel:::defaultClusterOptions$user <- NULL
will not work.Another alternative would be to use a custom
rshcmd
script that discards the-l username
options. However, that would also discarduser
when it is legitimitely specified as an argument tomakeCluster()
. It would also be tricky to write a solution that would work out-of-the-box on all operating systems.The only solution I see is to parse
~/.ssh/config
to check whether it specifies a different remote username than the default local username. If it does, thenuser
should be set to this remote username specified in~/.ssh/config
. That could kind of work, but it would require a robust parser.In summary, I don't see a neat workaround for this problem.
Suggestion
If option
-l <username>
is not specified in the call tossh
, it will fall back to use whatever is specified in the~/.ssh/config
file and otherwise it will fall back to use the local username (basically-l $USER
). In other words, there is no real reason for the parallel package to do this work instead ofssh
. If the parallel package would only pass-l <username>
ifuser
is explicitly set, then andUser
specifications in~/.ssh/config
would also be acknowledged.Patch
Here's an SVN patch (
svn diff src/library/parallel/R/snow*.R
) that would achieve this:Proof of concept
Here's a proof-of-concept hack that allows you to test the above patch without having to rebuild R from source: