Azure / doAzureParallel

A R package that allows users to submit parallel workloads in Azure
MIT License
107 stars 51 forks source link

doAzureParallel in windows 10 #188

Closed glalwani2 closed 6 years ago

glalwani2 commented 6 years ago

Hi All,

I am trying to run an R Script from my windows 10 machine. In the R script I have used doAzureParallel library to leverage parallelism. I have configured a Batch Account in Azure with a Storage. When I try to call this

cluster <- makeCluster("cluster.json")

I get an error 'Start task failed'. That is when the clusters are getting created then this error is thrown.

Booting compute nodes. . . |===============================================================================================================================| 100% Your cluster has been registered. Dedicated Node Count: 3 Low Priority Node Count: 3 Warning messages: 1: In waitForNodesToComplete(poolConfig$name, 60000) : The following 5 nodes failed while running the start task: tvm-2434664350_1-20171211t165649z tvm-2434664350_2-20171211t165649z tvm-2434664350_3-20171211t165649z tvm-2434664350_5-20171211t165714z-p tvm-2434664350_6-20171211t165714z-p

2: In waitForNodesToComplete(poolConfig$name, 60000) : The following 6 nodes failed while running the start task: tvm-2434664350_1-20171211t165649z tvm-2434664350_2-20171211t165649z tvm-2434664350_3-20171211t165649z tvm-2434664350_4-20171211t165714z-p tvm-2434664350_5-20171211t165714z-p tvm-2434664350_6-20171211t165714z-p

I checked the stderr.txt file and I get these :

2017-12-11 16:58:16 (141 MB/s) - 'install_bioconductor.R' saved [768/768]

debconf: unable to initialize frontend: Dialog debconf: (TERM is not set, so the dialog frontend is not usable.) debconf: falling back to frontend: Readline debconf: unable to initialize frontend: Readline debconf: (This frontend requires a controlling tty.) debconf: falling back to frontend: Teletype dpkg-preconfigure: unable to re-open stdin: debconf: unable to initialize frontend: Dialog debconf: (TERM is not set, so the dialog frontend is not usable.) debconf: falling back to frontend: Readline debconf: unable to initialize frontend: Readline debconf: (This frontend requires a controlling tty.) debconf: falling back to frontend: Teletype dpkg-preconfigure: unable to re-open stdin: Error: Cannot perform an interactive login from a non TTY device

My questions:

  1. Is it even possible to use doAzureParrellel from local windows machine R session to run parallel jobs on ubuntu(since I read somewhere that doAzureParallel uses data science Ubuntu VM to do its processing)

  2. If yes then what I am particulary doing wrong. My task is very simple I have to run an rscript on a huge dataset becasue I dont have that much computing power in my local machine. If there is any other alternate using Azure then please let me know.

Thanks

paselem commented 6 years ago

Hi @glalwani2, First off, yes we do most of our testing from Windows machines so that should be quite possible. Secondly, your stdout.err has some warnings but that last error seems a bit curious. Can you please share your cluster.json file with us and we can help identify what is going on. In this case, it would also be useful if you could share your stdout.txt file since it may contain more details about the issue.

glalwani2 commented 6 years ago

Hi @paselem ,

Here is the cluster.json :

{
  "name": "mysalescluster",
  "vmSize": "Standard_D2_v2",
  "maxTasksPerNode": 2,
  "poolSize": {
    "dedicatedNodes": {
      "min": 3,
      "max": 3
    },
    "lowPriorityNodes": {
      "min": 3,
      "max": 3
    },
    "autoscaleFormula": "QUEUE"
  },
  "containerImage": "rocker/tidyverse:latest",
  "rPackages": {
    "cran": [],
    "github": [],
    "bioconductor": []
  },
  "commandLine": []
}

here is the stdout.txt file dump :

Reading package lists... Building dependency tree... Reading state information... The following additional packages will be installed: crda iw libnl-3-200 libnl-genl-3-200 linux-image-4.4.0-103-generic linux-image-extra-4.4.0-103-generic linux-image-generic thermald wireless-regdb Suggested packages: fdutils linux-doc-4.4.0 | linux-source-4.4.0 linux-tools linux-headers-4.4.0-103-generic The following NEW packages will be installed: crda iw libnl-3-200 libnl-genl-3-200 linux-image-4.4.0-103-generic linux-image-extra-4.11.0-1016-azure linux-image-extra-4.4.0-103-generic linux-image-extra-virtual linux-image-generic thermald wireless-regdb 0 upgraded, 11 newly installed, 0 to remove and 0 not upgraded. Need to get 67.7 MB of archives. After this operation, 267 MB of additional disk space will be used. Get:1 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libnl-3-200 amd64 3.2.27-1ubuntu0.16.04.1 [52.2 kB] Get:2 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 libnl-genl-3-200 amd64 3.2.27-1ubuntu0.16.04.1 [11.2 kB] Get:3 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 wireless-regdb all 2015.07.20-1ubuntu1 [9058 B] Get:4 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 iw amd64 3.17-1 [63.5 kB] Get:5 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 crda amd64 3.13-1 [60.5 kB] Get:6 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 linux-image-4.4.0-103-generic amd64 4.4.0-103.126 [21.9 MB] Get:7 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 linux-image-extra-4.11.0-1016-azure amd64 4.11.0-1016.16 [9301 kB] Get:8 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 linux-image-extra-4.4.0-103-generic amd64 4.4.0-103.126 [36.0 MB] Get:9 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 linux-image-generic amd64 4.4.0.103.108 [2314 B] Get:10 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 linux-image-extra-virtual amd64 4.4.0.103.108 [1768 B] Get:11 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 thermald amd64 1.5-2ubuntu4 [187 kB] Fetched 67.7 MB in 1s (61.7 MB/s) Selecting previously unselected package libnl-3-200:amd64. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 54174 files and directories currently installed.) Preparing to unpack .../libnl-3-200_3.2.27-1ubuntu0.16.04.1_amd64.deb ... Unpacking libnl-3-200:amd64 (3.2.27-1ubuntu0.16.04.1) ... Selecting previously unselected package libnl-genl-3-200:amd64. Preparing to unpack .../libnl-genl-3-200_3.2.27-1ubuntu0.16.04.1_amd64.deb ... Unpacking libnl-genl-3-200:amd64 (3.2.27-1ubuntu0.16.04.1) ... Selecting previously unselected package wireless-regdb. Preparing to unpack .../wireless-regdb_2015.07.20-1ubuntu1_all.deb ... Unpacking wireless-regdb (2015.07.20-1ubuntu1) ... Selecting previously unselected package iw. Preparing to unpack .../archives/iw_3.17-1_amd64.deb ... Unpacking iw (3.17-1) ... Selecting previously unselected package crda. Preparing to unpack .../archives/crda_3.13-1_amd64.deb ... Unpacking crda (3.13-1) ... Selecting previously unselected package linux-image-4.4.0-103-generic. Preparing to unpack .../linux-image-4.4.0-103-generic_4.4.0-103.126_amd64.deb ... debconf: unable to initialize frontend: Dialog debconf: (TERM is not set, so the dialog frontend is not usable.) debconf: falling back to frontend: Readline Done. Unpacking linux-image-4.4.0-103-generic (4.4.0-103.126) ... Selecting previously unselected package linux-image-extra-4.11.0-1016-azure. Preparing to unpack .../linux-image-extra-4.11.0-1016-azure_4.11.0-1016.16_amd64.deb ... Unpacking linux-image-extra-4.11.0-1016-azure (4.11.0-1016.16) ... Selecting previously unselected package linux-image-extra-4.4.0-103-generic. Preparing to unpack .../linux-image-extra-4.4.0-103-generic_4.4.0-103.126_amd64.deb ... Unpacking linux-image-extra-4.4.0-103-generic (4.4.0-103.126) ... Selecting previously unselected package linux-image-generic. Preparing to unpack .../linux-image-generic_4.4.0.103.108_amd64.deb ... Unpacking linux-image-generic (4.4.0.103.108) ... Selecting previously unselected package linux-image-extra-virtual. Preparing to unpack .../linux-image-extra-virtual_4.4.0.103.108_amd64.deb ... Unpacking linux-image-extra-virtual (4.4.0.103.108) ... Selecting previously unselected package thermald. Preparing to unpack .../thermald_1.5-2ubuntu4_amd64.deb ... Unpacking thermald (1.5-2ubuntu4) ... Processing triggers for libc-bin (2.23-0ubuntu9) ... Processing triggers for man-db (2.7.5-1) ... Processing triggers for dbus (1.10.6-1ubuntu3.3) ... Processing triggers for ureadahead (0.100.0-19) ... Processing triggers for systemd (229-4ubuntu21) ... Setting up libnl-3-200:amd64 (3.2.27-1ubuntu0.16.04.1) ... Setting up libnl-genl-3-200:amd64 (3.2.27-1ubuntu0.16.04.1) ... Setting up wireless-regdb (2015.07.20-1ubuntu1) ... Setting up iw (3.17-1) ... Setting up crda (3.13-1) ... Setting up linux-image-4.4.0-103-generic (4.4.0-103.126) ... Running depmod. update-initramfs: deferring update (hook will be called later) Examining /etc/kernel/postinst.d. run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.4.0-103-generic /boot/vmlinuz-4.4.0-103-generic run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.4.0-103-generic /boot/vmlinuz-4.4.0-103-generic update-initramfs: Generating /boot/initrd.img-4.4.0-103-generic W: mdadm: /etc/mdadm/mdadm.conf defines no arrays. run-parts: executing /etc/kernel/postinst.d/unattended-upgrades 4.4.0-103-generic /boot/vmlinuz-4.4.0-103-generic run-parts: executing /etc/kernel/postinst.d/update-notifier 4.4.0-103-generic /boot/vmlinuz-4.4.0-103-generic run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.4.0-103-generic /boot/vmlinuz-4.4.0-103-generic Generating grub configuration file ... Found linux image: /boot/vmlinuz-4.11.0-1016-azure Found initrd image: /boot/initrd.img-4.11.0-1016-azure Found linux image: /boot/vmlinuz-4.4.0-103-generic Found initrd image: /boot/initrd.img-4.4.0-103-generic done Setting up linux-image-extra-4.11.0-1016-azure (4.11.0-1016.16) ... run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.11.0-1016-azure /boot/vmlinuz-4.11.0-1016-azure run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.11.0-1016-azure /boot/vmlinuz-4.11.0-1016-azure update-initramfs: Generating /boot/initrd.img-4.11.0-1016-azure W: mdadm: /etc/mdadm/mdadm.conf defines no arrays. run-parts: executing /etc/kernel/postinst.d/unattended-upgrades 4.11.0-1016-azure /boot/vmlinuz-4.11.0-1016-azure run-parts: executing /etc/kernel/postinst.d/update-notifier 4.11.0-1016-azure /boot/vmlinuz-4.11.0-1016-azure run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.11.0-1016-azure /boot/vmlinuz-4.11.0-1016-azure Generating grub configuration file ... Found linux image: /boot/vmlinuz-4.11.0-1016-azure Found initrd image: /boot/initrd.img-4.11.0-1016-azure Found linux image: /boot/vmlinuz-4.4.0-103-generic Found initrd image: /boot/initrd.img-4.4.0-103-generic done Setting up linux-image-extra-4.4.0-103-generic (4.4.0-103.126) ... run-parts: executing /etc/kernel/postinst.d/apt-auto-removal 4.4.0-103-generic /boot/vmlinuz-4.4.0-103-generic run-parts: executing /etc/kernel/postinst.d/initramfs-tools 4.4.0-103-generic /boot/vmlinuz-4.4.0-103-generic update-initramfs: Generating /boot/initrd.img-4.4.0-103-generic W: mdadm: /etc/mdadm/mdadm.conf defines no arrays. run-parts: executing /etc/kernel/postinst.d/unattended-upgrades 4.4.0-103-generic /boot/vmlinuz-4.4.0-103-generic run-parts: executing /etc/kernel/postinst.d/update-notifier 4.4.0-103-generic /boot/vmlinuz-4.4.0-103-generic run-parts: executing /etc/kernel/postinst.d/zz-update-grub 4.4.0-103-generic /boot/vmlinuz-4.4.0-103-generic Generating grub configuration file ... Found linux image: /boot/vmlinuz-4.11.0-1016-azure Found initrd image: /boot/initrd.img-4.11.0-1016-azure Found linux image: /boot/vmlinuz-4.4.0-103-generic Found initrd image: /boot/initrd.img-4.4.0-103-generic done Setting up linux-image-generic (4.4.0.103.108) ... Setting up linux-image-extra-virtual (4.4.0.103.108) ... Setting up thermald (1.5-2ubuntu4) ... Processing triggers for libc-bin (2.23-0ubuntu9) ... Reading package lists... Building dependency tree... Reading state information... apt-transport-https is already the newest version (1.2.24). 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. Reading package lists... Building dependency tree... Reading state information... curl is already the newest version (7.47.0-1ubuntu2.5). 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. Reading package lists... Building dependency tree... Reading state information... ca-certificates is already the newest version (20170717~16.04.1). 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. Reading package lists... Building dependency tree... Reading state information... software-properties-common is already the newest version (0.96.20.7). 0 upgraded, 0 newly installed, 0 to remove and 0 not upgraded. OK Hit:1 http://azure.archive.ubuntu.com/ubuntu xenial InRelease Get:2 http://azure.archive.ubuntu.com/ubuntu xenial-updates InRelease [102 kB] Get:3 http://azure.archive.ubuntu.com/ubuntu xenial-backports InRelease [102 kB] Get:4 https://download.docker.com/linux/ubuntu xenial InRelease [49.8 kB] Get:5 http://azure.archive.ubuntu.com/ubuntu xenial/main Sources [868 kB] Get:6 http://azure.archive.ubuntu.com/ubuntu xenial/restricted Sources [4808 B] Get:7 http://azure.archive.ubuntu.com/ubuntu xenial/universe Sources [7728 kB] Get:8 http://azure.archive.ubuntu.com/ubuntu xenial/multiverse Sources [179 kB] Get:9 http://security.ubuntu.com/ubuntu xenial-security InRelease [102 kB] Get:10 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main Sources [286 kB] Get:11 http://azure.archive.ubuntu.com/ubuntu xenial-updates/restricted Sources [3404 B] Get:12 http://azure.archive.ubuntu.com/ubuntu xenial-updates/universe Sources [184 kB] Get:13 http://azure.archive.ubuntu.com/ubuntu xenial-updates/multiverse Sources [7960 B] Get:14 http://azure.archive.ubuntu.com/ubuntu xenial-updates/main amd64 Packages [678 kB] Get:15 http://azure.archive.ubuntu.com/ubuntu xenial-updates/universe amd64 Packages [565 kB] Get:16 http://azure.archive.ubuntu.com/ubuntu xenial-updates/universe Translation-en [229 kB] Get:17 http://azure.archive.ubuntu.com/ubuntu xenial-updates/multiverse amd64 Packages [16.2 kB] Get:18 http://azure.archive.ubuntu.com/ubuntu xenial-backports/main Sources [3428 B] Get:19 http://azure.archive.ubuntu.com/ubuntu xenial-backports/universe Sources [4904 B] Get:20 https://download.docker.com/linux/ubuntu xenial/stable amd64 Packages [2756 B] Get:21 http://security.ubuntu.com/ubuntu xenial-security/main Sources [104 kB] Get:22 http://security.ubuntu.com/ubuntu xenial-security/restricted Sources [2600 B] Get:23 http://security.ubuntu.com/ubuntu xenial-security/universe Sources [48.0 kB] Get:24 http://security.ubuntu.com/ubuntu xenial-security/multiverse Sources [1516 B] Fetched 11.3 MB in 2s (5464 kB/s) Reading package lists... Reading package lists... Building dependency tree... Reading state information... The following additional packages will be installed: aufs-tools cgroupfs-mount libltdl7 Suggested packages: mountall The following NEW packages will be installed: aufs-tools cgroupfs-mount docker-ce libltdl7 0 upgraded, 4 newly installed, 0 to remove and 0 not upgraded. Need to get 21.1 MB/21.2 MB of archives. After this operation, 100 MB of additional disk space will be used. Get:1 http://azure.archive.ubuntu.com/ubuntu xenial/main amd64 libltdl7 amd64 2.4.6-0.1 [38.3 kB] Get:2 https://download.docker.com/linux/ubuntu xenial/stable amd64 docker-ce amd64 17.09.1~ce-0~ubuntu [21.0 MB] Fetched 21.1 MB in 0s (38.8 MB/s) Selecting previously unselected package aufs-tools. (Reading database ... (Reading database ... 5% (Reading database ... 10% (Reading database ... 15% (Reading database ... 20% (Reading database ... 25% (Reading database ... 30% (Reading database ... 35% (Reading database ... 40% (Reading database ... 45% (Reading database ... 50% (Reading database ... 55% (Reading database ... 60% (Reading database ... 65% (Reading database ... 70% (Reading database ... 75% (Reading database ... 80% (Reading database ... 85% (Reading database ... 90% (Reading database ... 95% (Reading database ... 100% (Reading database ... 60889 files and directories currently installed.) Preparing to unpack .../aufs-tools_1%3a3.2+20130722-1.1ubuntu1_amd64.deb ... Unpacking aufs-tools (1:3.2+20130722-1.1ubuntu1) ... Selecting previously unselected package cgroupfs-mount. Preparing to unpack .../cgroupfs-mount_1.2_all.deb ... Unpacking cgroupfs-mount (1.2) ... Selecting previously unselected package libltdl7:amd64. Preparing to unpack .../libltdl7_2.4.6-0.1_amd64.deb ... Unpacking libltdl7:amd64 (2.4.6-0.1) ... Selecting previously unselected package docker-ce. Preparing to unpack .../docker-ce_17.09.1~ce-0~ubuntu_amd64.deb ... Unpacking docker-ce (17.09.1~ce-0~ubuntu) ... Processing triggers for libc-bin (2.23-0ubuntu9) ... Processing triggers for man-db (2.7.5-1) ... Processing triggers for ureadahead (0.100.0-19) ... Processing triggers for systemd (229-4ubuntu21) ... Setting up aufs-tools (1:3.2+20130722-1.1ubuntu1) ... Setting up cgroupfs-mount (1.2) ... Setting up libltdl7:amd64 (2.4.6-0.1) ... Setting up docker-ce (17.09.1~ce-0~ubuntu) ... sent invalidate(passwd) request, exiting sent invalidate(group) request, exiting sent invalidate(group) request, exiting

Processing triggers for libc-bin (2.23-0ubuntu9) ... Processing triggers for systemd (229-4ubuntu21) ... Processing triggers for ureadahead (0.100.0-19) ...

glalwani2 commented 6 years ago

@paselem No matter what I do I get the same error even with the simplest cluster.json. I am not sure whats the problem with setting up these clusters. Is there any chance I am doing something wrong in credentials.json file

paselem commented 6 years ago

@glalwani2 - This seems to be an issue w/ the cluster trying to download the docker image. I believe it is a bug in the toolset. While we investigate can you please try downloading a previous version of the toolkit:

devtools::install_github("azure/doAzureParallel", ref="v0.6.1")
library(doAzureParallel)
paselem commented 6 years ago

Issue found. #189

glalwani2 commented 6 years ago

Thanks @paselem that did solve my problem of at least moving my cluster creating code a bit, but still I am getting 'start task failed', I think this time because in 'cluster.config' when I am trying to put cran package name 'tidyverse' and 'prophet'(from facebook) then in 'stderr.txt' files I am getting error related to packages not getting installed. I am not sure why is that, since I am sure that all the dependencies for 'tidverse' and 'prophet' will be installed automatically by the VM right.

Here is the 'cluster.json' file :

{
  "name": "mysalescluster",
  "vmSize": "Standard_D2_v2",
  "maxTasksPerNode": 1,
  "poolSize": {
    "dedicatedNodes": {
      "min": 3,
      "max": 3
    },
    "lowPriorityNodes": {
      "min": 3,
      "max": 3
    },
    "autoscaleFormula": "QUEUE"
  },
  "containerImage": "rocker/tidyverse:latest",
  "rPackages": {
    "cran": ["tidyverse", "prophet"],
    "github": [],
    "bioconductor": []
  },
  "commandLine": []
}

The 'stderr.txt' file is attached. From the dump it can be conferred that there is some problem while installing 'rstan' which is required for 'prophet'. Can you please help me with that. stderr.txt

brnleehng commented 6 years ago

@glalwani2

Tidyverse is already installed on the dockerfile so you don't need to add 'tidyverse' to the cluster configuration file. 'rstan' package installation failed due to compiler settings issues. However, adding compiler settings will require creating your own dockerfile. We have some documentation on building your own dockerfile. (https://github.com/Azure/doAzureParallel/blob/master/docs/32-building-containers.md)

I found a dockerfile with 'rstan' preinstalled that already builds on top of the rocker/tidyverse image. (https://hub.docker.com/r/jrnold/rstan/). We can replace the rocker/tidyverse:latest in the cluster configuration with 'jrnold/rstan:latest'.

{
  "name": "prophet",
  "vmSize": "Standard_D2_v2",
  "maxTasksPerNode": 1,
  "poolSize": {
    "dedicatedNodes": {
      "min": 0,
      "max": 0
    },
    "lowPriorityNodes": {
      "min": 2,
      "max": 2
    },
    "autoscaleFormula": "QUEUE"
  },
  "containerImage": "jrnold/rstan:latest",
  "rPackages": {
    "cran": ["prophet"],
    "github": [],
    "bioconductor": []
  },
  "commandLine": []
}

Here's the R script that I ran to verify that 'prophet' package is running.

library(doAzureParallel)
setCredentials("credentials.json")
cluster <- makeCluster("cluster.json")
registerDoAzureParallel(cluster)

foreach(i = 1:2, .packages = c("prophet", "dplyr")) %dopar% {
  df <- read.csv('https://raw.githubusercontent.com/facebook/prophet/master/examples/example_wp_peyton_manning.csv') %>%
    mutate(y = log(y))

  m <- prophet(df)

  m
}

Let me know if you are running to any other issues.

Thanks! Brian

paselem commented 6 years ago

@glalwani2 - it looks like this issue has been resolved so I will close it. Please re-open if you feel there is still an issue.