Azure / doAzureParallel

A R package that allows users to submit parallel workloads in Azure
MIT License
107 stars 51 forks source link

R packages not installed on pool creation #13

Closed vapaunic closed 7 years ago

vapaunic commented 7 years ago

Hi,

I include the following R packages in the pool configuration file: "rPackages": { "cran": ["hts", "lubridate", "tidyr", "dplyr"], "github": [] }

However, they don't get installed on the VMs upon pool creation.

I confirmed this (on multiple pools) by running the following piece of code on the pool as the compute backend: result <- foreach(i=1:20) %dopar% { ip <- c("hts", "lubridate", "tidyr", "dplyr") %in% installed.packages() } which returns a list of c(FALSE, FALSE, FALSE, TRUE) arrays (dplyr seems to come with the R distribution).

Is there a way to make sure the packages get installed on the vms only once? I'd rather not pass them to foreach through .packages argument (to be installed upon each iteration).

Thanks!

brnleehng commented 7 years ago

There's an issue with installation on the start task of the pool. The default R library path is not writable.. I'll look into installation packages today as of right now this is the only way to install the packages once on a VM.

brnleehng commented 7 years ago

I've merged some PRs to both packages (rAzureBatch and doAzureParallel) for R package installation on the pool level. R installation commands were not being executed in the /bin/bash properly.

Ran your test code, I got a list of c(TRUE, TRUE, TRUE, TRUE) arrays now.

Thanks!

https://github.com/Azure/rAzureBatch/pull/9

https://github.com/Azure/doAzureParallel/pull/14

vapaunic commented 7 years ago

Thank you for your prompt fix!