Closed angusrtaylor closed 5 years ago
Not sure if it's an issue with doAzureParallel, the Azure Batch autoscale occurs in time intervals. By default, the time interval is set as every 5 minutes.
Are your tasks being run longer than 5 minutes? This will ensure that the autoscale event occurs. We have release a new queue formula 'QUEUE_AND_RUNNING'. This formula is based on the amount of tasks in the queue and current running takss.
Ahhh ok that will be my problem. Yes the total time for my job to run sequentially is less than 5 minutes. The QUEUE_AND_RUNNING formula is what I'm looking for I think. I'll try it out. Thanks!
@brnleehng while we're on the subject of autoscaling, is it possible to scale a cluster down to 0 nodes when not in use? It seems you can do this with Azure Batch but I can't create a cluster with doAzureParallel with less than 1 dedicated or low-priority node as minimum. It would be good to be able to scale to 0 to avoid unnecessary cost.
You should be able to autoscale to 0 by setting your min to 0.
Here's an example cluster config:
{
"name": "test",
"vmSize": "Standard_F4",
"maxTasksPerNode": 1,
"poolSize": {
"dedicatedNodes": {
"min": 0,
"max": 0
},
"lowPriorityNodes": {
"min": 0,
"max": 5
},
"autoscaleFormula": "QUEUE"
},
"containerImage": "rocker/tidyverse:latest",
"rPackages": {
"cran": [],
"github": [],
"bioconductor": []
},
"commandLine": []
}
@brnleehng I get the following error if I try and create this cluster:
Error in waitForNodesToComplete(poolConfig$name, 60000) : Pool count needs to be greater than 0.
I have a cluster with min nodes = 1 and max nodes = 5 and autoscale='QUEUE'. I have 500 individual tasks, which each take only a few seconds to execute. Therefore it makes sense to set chunksize = 100 so that sets of 100 tasks will run on a single node in the same R session.
The problem is doAzureParallel seems to interpret this as there being only 5 tasks and the cluster does not increase in size. I would expect the autoscaling to increase the cluster size as it takes a long time for the 500 tasks to execute sequentially on one node.
Is this an issue with doAzureParallel or Azure Batch autoscaling? Do you have any advice on how to run this type of job?
Thanks