Open ctlamb opened 6 years ago
Hi @ctlamb
Are you running the installation for doAzureParallel on the cluster config installation or in the foreach?
Thanks, Brian
In the foreach
rast.results <- foreach(i = 1:nrow(bp),.packages = c("doParallel", "here", "dismo", "gbm", "snow"),
github = c("Azure/doAzureParallel"), .errorhandling="pass",
.options.azure = list(enableCloudCombine=FALSE,
job = job_name)) %dopar% {
This is ClusterConfig
clusterConfig <- list(
"name" = "LambRaster",
"vmSize" = "Standard_D12_v2",
"maxTasksPerNode" = 1,
"poolSize" = list(
"dedicatedNodes" = list(
"min" = 1,
"max" = 200
),
"lowPriorityNodes" = list(
"min" = 0,
"max" = 0
),
"autoscaleFormula" = "QUEUE"
),
"containerImage" = "rocker/geospatial:latest",
"rPackages" = list(
"cran" = list(),
"github" = list(),
"bioconductor" = list()
),
"commandLine" = list()
)
I would recommend installing the R packages on the cluster configuration level so you don't need to install every single job. Also the job will not start if the start tasks of the cluster have failed.
clusterConfig <- list(
"name" = "LambRaster",
"vmSize" = "Standard_D12_v2",
"maxTasksPerNode" = 1,
"poolSize" = list(
"dedicatedNodes" = list(
"min" = 1,
"max" = 200
),
"lowPriorityNodes" = list(
"min" = 0,
"max" = 0
),
"autoscaleFormula" = "QUEUE"
),
"containerImage" = "rocker/geospatial:latest",
"rPackages" = list(
"cran" = list("doParallel", "here", "dismo", "gbm", "snow"),
"github" = list("Azure/doAzureParallel"),
"bioconductor" = list()
),
"commandLine" = list()
)
Move the doAzureParallel package name into the regular .packages vector.
rast.results <- foreach(i = 1:nrow(bp),.packages = c("doParallel", "here", "dismo", "gbm", "snow", "doAzureParallel"), .errorhandling="pass",
.options.azure = list(enableCloudCombine=FALSE,
job = job_name)) %dopar% {
I'll need to see the logs from the job preparation tasks of the batch node. However, the getClusterFile does not work for job preparation tasks. I've created a separate issue for this.
If you have the portal for Azure Batch portal, you can go to:
Batch Pools > (Name of your pool) > Nodes > Click on the node > in the search bar "/workitems/
Thanks, Brian
Thanks, @brnleehng this makes better sense.
I used the clusterConfig you made above (plus some debugging of my own after) but it seems to produce an error, which I can confirm is not present when I run without loading the packages in the clusterConfig
=======================================================================================================================================================================================
Name: LambRaster
Configuration:
Docker Image: rocker/geospatial:latest
MaxTasksPerNode: 1
Node Size: Standard_D12_v2
cranPackages:
Error in cat(list(...), file, sep, fill, labels, append) :
argument 1 (type 'list') cannot be handled by 'cat'
Hi @ctlamb
It appears the cluster config file programmatically. Takes a character vector instead of a list for the R packages parameter, I'll update the docs for clarification.
clusterConfig <- list(
"name" = "LambRaster",
"vmSize" = "Standard_D12_v2",
"maxTasksPerNode" = 1,
"poolSize" = list(
"dedicatedNodes" = list(
"min" = 1,
"max" = 200
),
"lowPriorityNodes" = list(
"min" = 0,
"max" = 0
),
"autoscaleFormula" = "QUEUE"
),
"containerImage" = "rocker/geospatial:latest",
"rPackages" = list(
"cran" = c("doParallel", "here", "dismo", "gbm", "snow"),
"github" = c("Azure/doAzureParallel"),
"bioconductor" = c()
),
"commandLine" = list()
)
Thanks, Brian
Awesome, this is solved, thanks!
I'm in the middle of running a big job: 200 VMs, 800 tasks. So far 500 tasks have completed but 120 have failed. I looked into the failures and can see that the stderr.txt files for failed nodes indicate doazureparallel failed to load.
stderr for failed job: running
But then hundreds of the jobs worked, and produced the following with no errors.