Azure / doAzureParallel

A R package that allows users to submit parallel workloads in Azure
MIT License
107 stars 51 forks source link

decoding base64 error message #297

Closed cegbuna closed 6 years ago

cegbuna commented 6 years ago

Hi,

Thanks for this great package.

I'm attempting to run a function in parallel using the package and I keep getting the following error message;

Error message Error in base64(txt, FALSE, mode) : decoding from base64 failed

I have tried different functions but still get the same error message. For this issue, I used the example from here

credential.json

{
  "batchAccount": {
    "name": "rsim",
    "key": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
    "url": "https://xxxx.centralus.batch.azure.com"
  },
  "storageAccount": {
    "name": "None",
    "key": "n/a"
  },
  "githubAuthenticationToken": ""
}

cluster.json

{
  "name": "myPoolName",
  "vmSize": "Standard_D2_v2",
  "maxTasksPerNode": 4,
  "poolSize": {
    "dedicatedNodes": {
      "min": 0,
      "max": 0
    },
    "lowPriorityNodes": {
      "min": 5,
      "max": 10
    },
    "autoscaleFormula": "QUEUE"
  },
  "containerImage": "rocker/tidyverse:latest",
  "rPackages": {
    "cran": [],
    "github": [],
    "bioconductor": []
  },
  "commandLine": []
}

Function

mean_change = 1.001 
volatility = 0.01 
opening_price = 100 

getClosingPrice <- function() { 
  days <- 1825 # ~ 5 years 
  movement <- rnorm(days, mean=mean_change, sd=volatility) 
  path <- cumprod(c(opening_price, movement)) 
  closingPrice <- path[days] 
  return(closingPrice) 
} 

# Run 10,000 simulations in series 
opt <- list(chunkSize = 10) 
closingPrices_p <- foreach(i = 1:100, .combine='c', .options.azure = opt) %dopar% { 
  replicate(100000, getClosingPrice()) 
} 

My credential and cluster .json files looks correct and my session info is below

R version 3.4.4 (2018-03-15)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=C                           LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] RCurl_1.95-4.11                 bitops_1.0-6                    base64_2.0                      doAzureParallel_0.7.1           iterators_1.0.10                foreach_1.4.4                  
[7] googleAuthR_0.6.3.9000          googleComputeEngineR_0.2.0.9000

loaded via a namespace (and not attached):
 [1] devtools_1.13.6   doParallel_1.0.11 rjson_0.2.20      R6_2.2.2          httr_1.3.1        globals_0.12.1    tools_3.4.4       parallel_3.4.4    rAzureBatch_0.6.1 git2r_0.21.0      withr_2.1.1       openssl_1.0.2    
[13] yaml_2.1.17       assertthat_0.2.0  digest_0.6.15     codetools_0.2-15  curl_3.2          memoise_1.1.0     mime_0.5          compiler_3.4.4    jsonlite_1.5      future_1.9.0      listenv_0.7.0  

Any help resolving this issue is greatly appreciated.

brnleehng commented 6 years ago

Hi @cegbuna

I don't see doAzureParallel in the sessionInfo. Is the issue happening when you are creating the cluster or running the foreach job? Can you try changing the credentials file to the new format?

{
  "sharedKey": {
    "batchAccount": {
      "name": xxxxxxxxx",
      "key": "xxxxxxxxxxxxx",
      "url": "https://xxxxxxxxxx.southcentralus.batch.azure.com"
    },
    "storageAccount": {
      "name": "xxxxxxxxxxxxx",
      "key": "xxxxxxxxxxxxxx",
      "endpointSuffix": "core.windows.net"
    }
  },
  "githubAuthenticationToken": "",
  "dockerAuthentication": {
    "username": "",
    "password": "",
    "registry": ""
  }
}

I'll try to reproduce the error.

Thanks, Brian

cegbuna commented 6 years ago

Thanks for the quick response, @brnleehng. The error message occurs when I'm running the foreach job. In your recommended format, I'm not sure which username, passwordand registryto use in the dockerAuthentication section.

brnleehng commented 6 years ago

Hi @cegbuna

I'm not able to repro it. Did you register the cluster with registerDoAzureParallel function? I didn't see in your sessionInfo that doAzureParallel was loaded.

You can ignore that section. The dockerAuthentication section is for private container registries.

Thanks, Brian

cegbuna commented 6 years ago

I did register the cluster by doing; cluster <- makeCluster("cluster.json") & registerDoAzureParallel(cluster) (took a while to register, not sure why). Ran the foreach code and still getting Error in base64(txt, FALSE, mode) : decoding from base64 failed

I also looked at my sessionInfo and doAzureParallel is loaded.

R version 3.4.4 (2018-03-15)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows >= 8 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=C                           LC_CTYPE=English_United States.1252    LC_MONETARY=English_United States.1252 LC_NUMERIC=C                           LC_TIME=English_United States.1252    

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] doAzureParallel_0.7.1 highcharter_0.6.0     doSNOW_1.0.16         snow_0.4-2            iterators_1.0.10      foreach_1.4.4         MonteCarlo_1.0.5      Runuran_0.24          aRpsDCA_1.1.1        
[10] lubridate_1.7.1       purrr_0.2.5           deSolve_1.21          dplyr_0.7.6          

loaded via a namespace (and not attached):
 [1] jsonlite_1.5      magrittr_1.5      tidyr_0.8.1       data.table_1.11.4 RCurl_1.95-4.11   pillar_1.2.1      htmltools_0.3.6   stringr_1.3.1     curl_3.2          broom_0.5.0       TTR_0.23-3        lattice_0.20-35  
[13] htmlwidgets_1.0   tidyselect_0.2.4  plyr_1.8.4        zoo_1.8-1         whisker_0.3-2     igraph_1.1.2      mime_0.5          pkgconfig_2.0.1   R6_2.2.2          digest_0.6.15     reshape_0.8.7     bindrcpp_0.2.2   
[25] stringi_1.1.7     yaml_2.1.17       codetools_0.2-15  rlecuyer_0.3-4    tibble_1.4.2      abind_1.4-5       httr_1.3.1        compiler_3.4.4    bindr_0.1.1       doParallel_1.0.11 backports_1.1.2   rAzureBatch_0.6.1
[37] Rcpp_0.12.18      assertthat_0.2.0  rjson_0.2.20      snowfall_1.84-6.1 tools_3.4.4       bitops_1.0-6      quantmod_0.4-12   rlist_0.4.6.1     xts_0.10-1        glue_1.2.0        rlang_0.2.1       nlme_3.1-131.1   
[49] grid_3.4.4

Thanks again.

brnleehng commented 6 years ago

Hi @cegbuna

I was able to reproduce the issue. You need to create an Azure storage account for doAzureParallel to work. We create an Azure storage container for every foreach loop. This allows you to have a persistent results of your foreach loop.

Thanks! Brian