Azure / doAzureParallel

A R package that allows users to submit parallel workloads in Azure
MIT License
107 stars 51 forks source link

Fail-fast opportunity for broken installations #81

Closed grayskripko closed 7 years ago

grayskripko commented 7 years ago

I try to install 'ranger' on nodes. My cluster.json file looks like

"rPackages": {
    "cran": ["ranger"],

I got a bunch of C++ errors on a node side

* installing *source* package 'ranger' ...
** package 'ranger' successfully unpacked and MD5 sums checked
** libs
sh: I/usr/lib64/microsoft-r/3.3/lib64/R/include: No such file or directory
make: [AAA_check_cpp11.o] Error 127 (ignored)
sh: I/usr/lib64/microsoft-r/3.3/lib64/R/include: No such file or directory
make: [Data.o] Error 127 (ignored)
sh: I/usr/lib64/microsoft-r/3.3/lib64/R/include: No such file or directory
...
make: [rangerCpp.o] Error 127 (ignored)
sh: I/usr/lib64/microsoft-r/3.3/lib64/R/include: No such file or directory
make: [utility.o] Error 127 (ignored)
sh: line 2: -shared: command not found
make: *** [ranger.so] Error 127
ERROR: compilation failed for package 'ranger'
* removing '/usr/lib64/microsoft-r/3.3/lib64/R/library/ranger'

The downloaded source packages are in
    '/tmp/Rtmpe0Q9qX/downloaded_packages'
Updating HTML index of packages in '.Library'
Making 'packages.html' ... done
Warning message:
In install.packages(args[1]) :
  installation of package 'ranger' had non-zero exit status
...

These installation problems do not affect the state of the nodes and all my nodes are 'Idle' after all and doAzureParallel::makeCluster does not notify me about package installation problems. I'm just going to get Cannot find 'ranger' error after running %dopar% function. It is not obvious where the real error is located. I suggest adding a warning or an error when such node installation error occurs

brnleehng commented 7 years ago

This has to do with how we are building the command line for R package installation. We are currently using ; instead of &&. We can add the option for fail-fast for package installation? @paselem for input?

paselem commented 7 years ago

I think that this makes sense. Change our chaining to be && and fail on any command. I think this should also bleed into our work to list out all VMs which failed to run the start task correctly. @brnleehng We can push this into our upcoming debug/troubleshooting milestone.

grayskripko commented 7 years ago

By the way, the reason why 'ranger' does not install properly is custom Microsoft R configurations. Please consider the opportunity to help users in package installation process using the next lines in cluster.json file:

"commandLine": [
    "r_conf=/usr/lib64/microsoft-r/3.3/lib64/R/etc/Makeconf",
    "sed -i 's/CXX1X = /CXX1X = gcc/g' $r_conf",
    "sed -i 's/CXX1XFLAGS = /CXX1XFLAGS = -fpic/g' $r_conf",
    "sed -i 's/CXX1XSTD =/CXX1XSTD = -std=c++11/g' $r_conf"
    ]

I lost one day for this problem.

paselem commented 7 years ago

@grayskripko - You nailed a known issue head on... Thanks for bringing this up. We are working closely with the Microsoft R Open team on this, and the fix is coming in the 3.5 release which is slated to ship very soon! Once that happens we hope to pull it in and resolve this issue.

paselem commented 7 years ago

Fixed with #91