futureverse / parallelly

R package: parallelly - Enhancing the 'parallel' Package
https://parallelly.futureverse.org
130 stars 7 forks source link

Automatically infer `rscript_sh` for remote OS #97

Open HenrikBengtsson opened 1 year ago

HenrikBengtsson commented 1 year ago

Triggered by #96, could we automatically detect what rscript_sh should be for remote workers? Right now it is hard-coded to rscript_sh = "sh" for remote workers based on the assumption that most clusters run on Unix-like systems.

Idea

At least when homogeneous = FALSE, we could query the remote operating system using something like:

$ '/usr/bin/ssh' pi-2021.local Rscript --vanilla -e .Platform | grep -A 1 -F OS.type
$OS.type
[1] "unix"

So, in R, something like:

## Set rscript_sh = "cmd", if remote machine runs MS Windows
rscript_sh <- "sh"
tryCatch({
  res <- system2(ssh_cmd, args = c(hostname, "Rscript", "--vanilla", "-e", ".Platform"), stdout = TRUE, stderr = TRUE)
  idx <- grep("OS.type", res)
  if (length(idx) == 0) return()
  if (any(grepl("windows", res[idx+1]))) rscript_sh <- "cmd"
}, error = identity)

The downside is that this adds to the startup time of each parallel worker (in the order of seconds)