Azure / doAzureParallel

A R package that allows users to submit parallel workloads in Azure
MIT License
107 stars 51 forks source link

Autoscaling with chunksize set not working as expected #333

Closed angusrtaylor closed 5 years ago

angusrtaylor commented 5 years ago

I have a cluster with min nodes = 1 and max nodes = 5 and autoscale='QUEUE'. I have 500 individual tasks, which each take only a few seconds to execute. Therefore it makes sense to set chunksize = 100 so that sets of 100 tasks will run on a single node in the same R session.

The problem is doAzureParallel seems to interpret this as there being only 5 tasks and the cluster does not increase in size. I would expect the autoscaling to increase the cluster size as it takes a long time for the 500 tasks to execute sequentially on one node.

Is this an issue with doAzureParallel or Azure Batch autoscaling? Do you have any advice on how to run this type of job?

Thanks

brnleehng commented 5 years ago

Not sure if it's an issue with doAzureParallel, the Azure Batch autoscale occurs in time intervals. By default, the time interval is set as every 5 minutes.

Are your tasks being run longer than 5 minutes? This will ensure that the autoscale event occurs. We have release a new queue formula 'QUEUE_AND_RUNNING'. This formula is based on the amount of tasks in the queue and current running takss.

angusrtaylor commented 5 years ago

Ahhh ok that will be my problem. Yes the total time for my job to run sequentially is less than 5 minutes. The QUEUE_AND_RUNNING formula is what I'm looking for I think. I'll try it out. Thanks!

angusrtaylor commented 5 years ago

@brnleehng while we're on the subject of autoscaling, is it possible to scale a cluster down to 0 nodes when not in use? It seems you can do this with Azure Batch but I can't create a cluster with doAzureParallel with less than 1 dedicated or low-priority node as minimum. It would be good to be able to scale to 0 to avoid unnecessary cost.

brnleehng commented 5 years ago

You should be able to autoscale to 0 by setting your min to 0.

Here's an example cluster config:

{
  "name": "test",
  "vmSize": "Standard_F4",
  "maxTasksPerNode": 1,
  "poolSize": {
    "dedicatedNodes": {
      "min": 0,
      "max": 0
    },
    "lowPriorityNodes": {
      "min": 0,
      "max": 5
    },
    "autoscaleFormula": "QUEUE"
  },
  "containerImage": "rocker/tidyverse:latest",
  "rPackages": {
    "cran": [],
    "github": [],
    "bioconductor": []
  },
  "commandLine": []
}
angusrtaylor commented 5 years ago

@brnleehng I get the following error if I try and create this cluster:

Error in waitForNodesToComplete(poolConfig$name, 60000) : Pool count needs to be greater than 0.