Azure / doAzureParallel

A R package that allows users to submit parallel workloads in Azure
MIT License
107 stars 51 forks source link

Problems loading R packages from CRAN and GitHub #157

Closed MilesAheadAlso closed 6 years ago

MilesAheadAlso commented 6 years ago

I am trying to do best-fir statistical forecasting that requires me to load several packages, some of which are not yet available in the version of R loaded with the VM. I've tried loading these from GitHub which does not seem to work. The only feedback I get is that the VMs failed to start, so I'm guessing.

I'm not sure if I need to load devtools in cluster.json and then use install_github, but this would be very costly because it would need to be run in each foreach loop.

This is the cluster.json I am using:

{ "name": "smallPool", "vmSize": "Standard_D2s_v3", "maxTasksPerNode": 1, "poolSize": { "dedicatedNodes": { "min": 10, "max": 10 }, "lowPriorityNodes": { "min": 0, "max": 10 }, "autoscaleFormula": "QUEUE" }, "rPackages": { "cran": ["TTR", "forecast", "seasonal", "dplyr", "forecastHybrid", "nnet", "foreach", "doParallel"], "github": ["EdwinTh/padr"], "githubAuthenticationToken": "" }, "commandLine": [] }

MilesAheadAlso commented 6 years ago

{ "name": "smallPool", "vmSize": "Standard_D2s_v3", "maxTasksPerNode": 1, "poolSize": { "dedicatedNodes": { "min": 10, "max": 10 }, "lowPriorityNodes": { "min": 0, "max": 10 }, "autoscaleFormula": "QUEUE" }, "rPackages": { "cran": ["TTR", "forecast", "seasonal", "dplyr", "forecastHybrid", "nnet", "foreach", "doParallel"], "github": ["EdwinTh/padr"], "githubAuthenticationToken": "" }, "commandLine": [] }

paselem commented 6 years ago

Hi @MilesAheadAlso - I am pretty sure that the issue is the machine size you're using. Currently our backend does not support any of the 'S' type Virtual Machines. Can you try Standard_D2_v3 instead?

MilesAheadAlso commented 6 years ago

Hi Pablo

I tried that VM, same result. Below in the RStudio console info, the cluster.json, and a screenshot from the Azure console.

RStudio Console

library(doAzureParallel) Loading required package: foreach foreach: simple, scalable parallel programming from Revolution Analytics Use Revolution R for scalability, fault tolerance and more. http://www.revolutionanalytics.com Loading required package: iterators

3. Set your credentials - you need to give the R session your

credentials to interact with Azure setCredentials("credentials.json") [1] "Your azure credentials have been set."

4. Register the pool. This will create a new pool if your pool hasn't

already been provisioned. startTime <- Sys.time() cluster <- makeCluster("smallVM_cluster.json") Booting compute nodes. . .

|===============================================================================================================================| 100% Your cluster has been registered. Dedicated Node Count: 1 Low Priority Node Count: 0 Warning message: In waitForNodesToComplete(poolConfig$name, 60000) : The following 1 nodes failed while running the start task: tvm-1763885094_1-20171104t101003z

endTime <- Sys.time() difftime(endTime,startTime) Time difference of 10.10355 mins

5. Register the pool as your parallel backend

registerDoAzureParallel(cluster)

6. Check that your parallel backend has been registered

getDoParWorkers() [1] 11 nItems <- 20 results <- foreach(i = 1:nItems) %dopar% {

  • itemHistory <- subset(History, ForecastItem == ForecastItem[i])
  • my_single_function(itemHistory, fcstList,
  • fcstOffset, fcstPeriods, fcstSeason,
  • dateMin, dateMax, weekDays,
  • bWrite, bClean, cAccuracy)
  • } Job Summary: Id: job20171104101832 Waiting for tasks to complete. . . | | 0%

smallVM_cluster.json

{ "name": "smallBaxterPool", "vmSize": "Standard_D2s_v3", "maxTasksPerNode": 1, "poolSize": { "dedicatedNodes": { "min": 1, "max": 1 }, "lowPriorityNodes": { "min": 0, "max": 10 }, "autoscaleFormula": "QUEUE" }, "rPackages": { "cran": ["TTR", "forecast", "seasonal", "dplyr", "forecastHybrid", "nnet", "foreach", "doParallel"], "github": ["EdwinTh/padr"], "githubAuthenticationToken": "" }, "commandLine": [] }

On Fri, Nov 3, 2017 at 12:47 PM, Pablo Selem notifications@github.com wrote:

Hi @MilesAheadAlso https://github.com/milesaheadalso - I am pretty sure that the issue is the machine size you're using. Currently our backend does not support any of the 'S' type Virtual Machines. Can you try Standard_D2_v3 instead?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Azure/doAzureParallel/issues/157#issuecomment-341761463, or mute the thread https://github.com/notifications/unsubscribe-auth/AewVaHEUZY3h_mopLiGEVm25x7O3QPxNks5sy0O1gaJpZM4QRXpv .

MilesAheadAlso commented 6 years ago

Sorry Pablo, I realized that I did not save smallVM_cluster.json so I used the D2s_V3 VM again. I've saved the json file and started the process again. My bad,

MilesAheadAlso commented 6 years ago

I get exactly the same result with D2_V3

MilesAheadAlso commented 6 years ago

Now trying with a D11

MilesAheadAlso commented 6 years ago

OK, so I've tried lots of stuff. It comes down to loading some packages. I do not know for certain all the packages that create a problem but padr definitely does. The cluster.json that worked is below.

{ "name": "smallBaxterPool", "vmSize": "Standard_D4_V3", "maxTasksPerNode": 1, "poolSize": { "dedicatedNodes": { "min": 1, "max": 1 }, "lowPriorityNodes": { "min": 0, "max": 10 }, "autoscaleFormula": "QUEUE" }, "rPackages": { "cran": ["TTR", "forecast", "seasonal", "forecastHybrid", "nnet"], "github": [], "githubAuthenticationToken": "" }, "commandLine": [] }

Every time I included padr, whether under cran or gitbub, the cluster creating failed.

paselem commented 6 years ago

Hi @MilesAheadAlso, Two things w.r.t. you tests. First off, I tested it on our latest release and I can't seem to reproduce the error (my cluster comes up). That said, our latest release has a major difference from the previous one which is that we run the latest CRAN R by default rather than Microsoft R Open 3.3.2. Below is the config file I used and the R script to test it.

Note - I'm using Standard_F2 because they are pretty cheap but have low RAM, but any VM size should behave the same with the exception of the 'S' variants.

Note - In order to use this, you will need to reinstall the latest version of doAzureParallel (0.6.0) and it's dependencies.

{
  "name": "padr",
  "vmSize": "Standard_F2",
  "maxTasksPerNode": 1,
  "poolSize": {
    "dedicatedNodes": {
      "min": 0,
      "max": 0
    },
    "lowPriorityNodes": {
      "min": 2,
      "max": 2
    },
    "autoscaleFormula": "QUEUE"
  },
  "rPackages": {
    "cran": ["TTR",
             "forecast",
             "seasonal",
             "dplyr",
             "forecastHybrid",
             "nnet",
             "foreach",
             "doParallel"],
    "github": ["EdwinTh/padr"],
    "githubAuthenticationToken": ""
  },
  "commandLine": []
}

Here is the sample script I ran, including the sample from the padr readme

res <-
    foreach::foreach(i = 1:2) %dopar% {
      library(padr)
      library(tidyverse)
      coffee <- data.frame(
        time_stamp =  as.POSIXct(c(
          '2016-07-07 09:11:21', '2016-07-07 09:46:48',

          '2016-07-09 13:25:17',
          '2016-07-10 10:45:11'
        )),
        amount = c(3.14, 2.98, 4.11, 3.14)
      )

      coffee %>%
        thicken('day') %>%
        dplyr::group_by(time_stamp_day) %>%
        dplyr::summarise(day_amount = sum(amount)) %>%
        pad() %>%
        fill_by_value(day_amount, value = 0)
    }

  res

And these are the results

[[1]]
  time_stamp_day day_amount
1     2016-07-07       6.12
2     2016-07-08       0.00
3     2016-07-09       4.11
4     2016-07-10       3.14

[[2]]
  time_stamp_day day_amount
1     2016-07-07       6.12
2     2016-07-08       0.00
3     2016-07-09       4.11
4     2016-07-10       3.14

That said, I understand this may be pretty unsatisfactory to answer your original question. If you would like to debug further, we would need some of the logs off of one of you cluster nodes. Please follow the troubleshooting steps to get the log files. If there is no immediately obvious reason, please send them our way and we can help take a look.

Thanks, -Pablo

MilesAheadAlso commented 6 years ago

Thx Pablo. So I should just reinstall the latest version of doAzureParallel?

On Sun, Nov 5, 2017 at 3:51 PM, Pablo Selem notifications@github.com wrote:

Hi @MilesAheadAlso https://github.com/milesaheadalso, Two things w.r.t. you tests. First off, I tested it on our latest release and I can't seem to reproduce the error (my cluster comes up). That said, our latest release has a major difference from the previous one which is that we run the latest CRAN R by default rather than Microsoft R Open 3.3.2. Below is the config file I used and the R script to test it.

Note - I'm using Standard_F2 because they are pretty cheap and have low RAM, but and VM size should behave the same with the exception of the 'S' variants.

Note - In order to use this, you will need to reinstall the latest version of doAzureParallel (0.6.0) and it's dependencies.

{ "name": "padr", "vmSize": "Standard_F2", "maxTasksPerNode": 1, "poolSize": { "dedicatedNodes": { "min": 0, "max": 0 }, "lowPriorityNodes": { "min": 2, "max": 2 }, "autoscaleFormula": "QUEUE" }, "rPackages": { "cran": ["TTR", "forecast", "seasonal", "dplyr", "forecastHybrid", "nnet", "foreach", "doParallel"], "github": ["EdwinTh/padr"], "githubAuthenticationToken": "" }, "commandLine": [] }

Here is the sample script I ran, including the sample from the padr readme

res <- foreach::foreach(i = 1:2) %dopar% { library(padr) library(tidyverse) coffee <- data.frame( time_stamp = as.POSIXct(c( '2016-07-07 09:11:21', '2016-07-07 09:46:48',

      '2016-07-09 13:25:17',
      '2016-07-10 10:45:11'
    )),
    amount = c(3.14, 2.98, 4.11, 3.14)
  )

  coffee %>%
    thicken('day') %>%
    dplyr::group_by(time_stamp_day) %>%
    dplyr::summarise(day_amount = sum(amount)) %>%
    pad() %>%
    fill_by_value(day_amount, value = 0)
}

res

And these are the results

[[1]] time_stamp_day day_amount1 2016-07-07 6.122 2016-07-08 0.003 2016-07-09 4.114 2016-07-10 3.14

[[2]] time_stamp_day day_amount1 2016-07-07 6.122 2016-07-08 0.003 2016-07-09 4.114 2016-07-10 3.14

That said, I understand this may be pretty unsatisfactory to answer your original question. If you would like to debug further, we would need some of the logs off of one of you cluster nodes. Please follow the troubleshooting steps https://github.com/Azure/doAzureParallel/blob/master/docs/40-troubleshooting.md to get the log files. If there is no immediately obvious reason, please send them our way and we can help take a look.

Thanks, -Pablo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Azure/doAzureParallel/issues/157#issuecomment-342004441, or mute the thread https://github.com/notifications/unsubscribe-auth/AewVaMhGdDBAh2Zp7NNWbdwkd230O1iTks5szh_KgaJpZM4QRXpv .

paselem commented 6 years ago

Yes, that is correct. Something like this should do the trick:

library(devtools)
devtools::install_github('azure/doAzureParallel', force = TRUE, ref = 'v0.6.0')
library(doAzureParallel)
paselem commented 6 years ago

Closing this issue assuming that the sample above has addressed the issue. Feel free to re-open if that is not the case.

akshaykadidal commented 6 years ago

Hi, I face this issue too but intermittently. I am not sure if this issue is region specific. when I try the same thing on my "south India" azure account it works just fine.