Connecting to existing cluster fails to load imageName in code #372

NewbieScriptWriter commented 3 years ago

If we connect to an existing cluster then jobs fail to run, likely due to missing "imageName" in code.

{"id":"1","commandLine":"/bin/bash -c \"set -e; set -o pipefail; Rscript --no-save --no-environ --no-restore --no-site-file --verbose $AZ_BATCH_JOB_PREP_WORKING_DIR/worker.R 1 10 0 stop > $AZ_BATCH_TASK_ID.txt; wait\"","userIdentity":{"autoUser":{"scope":"pool","elevationLevel":"admin"}},"environmentSettings"........removed code for privacy........{"filePattern":"../stdout.txt","destination":{"container":{"path":"stdout/1-........removed code for privacy........constraints":{"maxTaskRetryCount":3},"exitConditions":{"default":{"dependencyAction":"satisfy"}},"containerSettings":{"imageName":{},"containerRunOptions":"--rm"}}

Error in curl::curl_fetch_memory(url, handle = handle) : Failure when receiving data from the peer

But if we create a new cluster and then connect to the existing cluster, then the code runs fine and the verbose output shows "imageName":"rocker/tidyverse:3.6.3":

{"id":"1","commandLine":"/bin/bash -c \"set -e; set -o pipefail; Rscript --no-save --no-environ --no-restore --no-site-file --verbose $AZ_BATCH_JOB_PREP_WORKING_DIR/worker.R 1 1 0 stop > $AZ_BATCH_TASK_ID.txt; wait\"","userIdentity":{"autoUser":{"scope":"pool","elevationLevel":"admin"}},"environmentSettings"........removed code for privacy........{"filePattern":"../stdout.txt","destination":{"container":{"path":"stdout/1-........removed code for privacy........constraints":{"maxTaskRetryCount":3},"exitConditions":{"default":{"dependencyAction":"satisfy"}},"containerSettings":{"imageName":"rocker/tidyverse:3.6.3","containerRunOptions":"--rm"}}

Steps we follow to reproduce the issue:

cluster already exists with several idle nodes

------------------# Load the doAzureParallel library


------------------# Logging on

setVerbose(TRUE) setHttpTraffic(TRUE)

------------------# Set your credentials


------------------# Get existing cluster

cluster <- getCluster("TestCluster_2020", verbose = TRUE)

------------------# Register the cluster as your parallel backend


------------------# Test simulation inputs

mean_change = 1.001 volatility = 0.01 opening_price = 100

getClosingPrice <- function() { days <- 1825 # ~ 5 years movement <- rnorm(days, mean=mean_change, sd=volatility) path <- cumprod(c(opening_price, movement)) closingPrice <- path[days] return(closingPrice) }

------------------# PARALLEL Test simulation

opt <- list(chunkSize = 10) start_p <- Sys.time()
closingPrices_p <- foreach(i = 1:10, .combine='c', = opt) %dopar% { replicate(10, getClosingPrice()) } end_p <- Sys.time()


difftime(end_p, start_p, unit = "min")

So the only way to run the code against an existing cluster is to create a "throw-away" cluster first and then use the existing cluster for execution:

------------------# Create your cluster in Azure passing, it your cluster config file.

throw-away cluster

cluster <- makeCluster("cluster.json")

------------------# Get existing cluster

cluster <- getCluster("TestCluster_2020", verbose = TRUE)

------------------# Register the cluster as your parallel backend



