Azure / doAzureParallel

A R package that allows users to submit parallel workloads in Azure
MIT License
107 stars 51 forks source link

All dedicated cores not in use #265

Open akshaykadidal opened 6 years ago

akshaykadidal commented 6 years ago

Hi Team,

I have created a pool with 64 cores with at least 20 dedicated workers (DS12). I have more than a 1000 jobs to finish. but the Azure pool is utilizing only 5 workers

image

here is the cluster.json file. Please let me know the reason for this and how I can fix it.

{
  "name": "#######",
  "vmSize": "standard_d12_v2",
  "maxTasksPerNode": 4,
  "poolSize": {
    "dedicatedNodes": {
      "min": 5,
      "max": 5
    },
    "lowPriorityNodes": {
      "min": 11,
      "max": 11
    },
    "autoscaleFormula": "QUEUE"
  },
  "containerImage": "rocker/tidyverse:latest",
  "rPackages": {
    "cran": ["DT","msm","doParallel","doSNOW","data.table","randomForest","stringr","raster","ranger","plyr","foreach","Cubist","stringdist","fGarch","scales","mice", "gbm", "dplyr"],
    "github": ["topepo/caret/pkg/caret", "imbs-hl/ranger"],
    "bioconductor": []
  },
  "commandLine": ["mkdir /mnt/batch/tasks/shared/data", "mount -t cifs //osmosedsbatchstorage.file.core.windows.net/data /mnt/batch/tasks/shared/data -o vers=3.0,username=<###############>,password=<#######################>,dir_mode=0777,file_mode=0777,sec=ntlmssp"]
}

Here is the command I am trying to run.

foreach(yy=iter(d, by='row'), .options.azure = options, .errorhandling = 'pass', .combine = "list",
        .packages = c("DT","msm","doParallel","doSNOW","data.table","randomForest","stringr","raster","ranger","plyr","foreach","Cubist","stringdist","fGarch","scales","mice", "gbm", "dplyr"))%dopar% {
    setwd('/mnt/batch/tasks/shared/data/')
    for(ii in 1:yy$ittr){
      azure_test_function(yy$Age, as.charachter(yy$State) )
    }
  }

azure_test_function is a masssive function that performs a lot of operations and at the end, writes the results of each itteration on the the shared folder.

brnleehng commented 6 years ago

Hi @akshaykadidal

I think the reason is because the low priority nodes are not available at the moment. https://github.com/Azure/doAzureParallel#low-priority-vms

Thanks, Brian

akshaykadidal commented 6 years ago

Hi Brian,

Thank you for the response. If you see the json I should have at least 20 dedicated workers. Why are they not available?

Thanks, Akshay

brnleehng commented 6 years ago

Hi @akshaykadidal

Can you install the latest doAzureParallel package (We needed to fix the list pool nodes functions #267 )?

Check the number of workers

getDoParWorkers()

This function will list the status of the cluster

getCluster("<name of your cluster>")

Thanks, Brian

brnleehng commented 6 years ago

@akshaykadidal

Are there any updates on your end?

Brian

akshaykadidal commented 6 years ago

Hi Brian,

I am afraid no. Sometimes i have 64 cores working some times only 4-5.

On Sat, 19 May 2018, 22:09 Brian, notifications@github.com wrote:

@akshaykadidal https://github.com/akshaykadidal

Are there any updates on your end?

Brian

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Azure/doAzureParallel/issues/265#issuecomment-390416872, or mute the thread https://github.com/notifications/unsubscribe-auth/AHNc_faprOYEtiRncCMLbFLAzCjTPf_iks5t0ErZgaJpZM4TuzGp .