Azure / doAzureParallel

A R package that allows users to submit parallel workloads in Azure
MIT License
107 stars 51 forks source link

doAzureParallel against Batch Pool residing in virtual network VNet #218

Closed omartin2010 closed 6 years ago

omartin2010 commented 6 years ago

Before submitting a bug please check the following:

Description Hi, When I run a doAzureParallel job against a batch cluster that is defined as part of a VNet (see here : https://docs.microsoft.com/en-us/azure/batch/batch-virtual-network), it turns out that the <- makeCluster step fails. I've turned on setVerbose(TRUE) before executing but don't get more useful output.

> setVerbose(value = TRUE)
> # Create your cluster if not exist
> cluster <- makeCluster("cluster.json")
Error in makeCluster("cluster.json") : 
  Check your credentials and try again.

Instruction to repro the problem if applicable Steps to repro : create a batch account Manually create the job pool in the portal or otherwise so you can extract the keys using this : az batch account keys list -g RG_NAME -n _BATCH_ACCNT_NAME and put that value in credentials.json.

Execute the code

# install packages
library(devtools)
install_github("azure/razurebatch")
install_github("azure/doazureparallel")

# import the doAzureParallel library and its dependencies
library(doAzureParallel)

# set your credentials
setCredentials("credentials.json")

setVerbose(value = TRUE)
# Create your cluster if not exist
cluster <- makeCluster("cluster.json")

This is where it fails.

Adding sessionInfo() output :

> sessionInfo()
R version 3.4.1 (2017-06-30)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows Server >= 2012 x64 (build 9200)

Matrix products: default

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252   
[3] LC_MONETARY=English_United States.1252 LC_NUMERIC=C                          
[5] LC_TIME=English_United States.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] doAzureParallel_0.6.2 iterators_1.0.8       foreach_1.4.4         devtools_1.13.3      
 [5] RevoUtilsMath_10.0.0  RevoUtils_10.0.5      RevoMods_11.0.0       MicrosoftML_1.5.0    
 [9] mrsdeploy_1.1.2       RevoScaleR_9.2.1      lattice_0.20-35       rpart_4.1-11         

loaded via a namespace (and not attached):
 [1] codetools_0.2-15       CompatibilityAPI_1.1.0 rAzureBatch_0.5.5      withr_2.0.0           
 [5] digest_0.6.12          bitops_1.0-6           grid_3.4.1             R6_2.2.2              
 [9] jsonlite_1.5           git2r_0.19.0           httr_1.3.1             curl_2.6              
[13] rjson_0.2.15           tools_3.4.1            RCurl_1.95-4.8         yaml_2.1.14           
[17] compiler_3.4.1         memoise_1.1.0          mrupdate_1.0.1        
paselem commented 6 years ago

Hi @omartin2010, doAzureParallel currently doesn't support Virtual Machines within a VNET because we do not have support for AAD Authentication yet. Batch VNET access requires AAD Auth, and doAzureParallel uses SharedKey auth.

Did you simply set up your cluster using Batch API's and are trying to point doAzureParallel at it? Unless you followed pretty specific steps it may not work as expected since this is not a supported workflow.

nkaul commented 6 years ago

@paselem : Is there a path forward for supporting the capability of running azure batch jobs with a VNet pool from R with user subscription mode. It is not clear if there is a way to register an App and use App API keys for authentication https://docs.microsoft.com/en-us/azure/batch/batch-aad-auth

paselem commented 6 years ago

@nkaul we are currently working in the plumbing to get doAzureParallel to support AAD Auth which will enable putting the clusters inside a VNET. We have never tested using UserSubscription mode, and it will not be officially supported for the time being, but it should work once we have AAD working. Is there any reason you're using UserSubscription mode instead of BatchService mode?

nkaul commented 6 years ago

@paselem : I tried to use UserSubscription with hope of using AAD. Is there a timeline for AAD Auth support?

paselem commented 6 years ago

@nkaul As mentioned before we are actively developing the feature now. It is a major rewrite of some base packages though so it will take some more time. We are planning on having an initial branch "beta" release of this in about 2 weeks, and a proper supported release about 2 weeks after that.

Are you interested in testing out the "beta" release once it's available?

Also, just to reiterate my previous statement, we will not support UserSubscription Batch accounts, only BatchManaged accounts. Today, both account types support VNETs so that should hopefully not be a blocker for your workloads.

nkaul commented 6 years ago

@paselem : I am definitely interested in testing out the beta release. Please inform me when it is available. UserSubscription Batch accounts should not be a blocker at this point in time.

nkaul commented 6 years ago

@paselem : Do you have the beta version available to test?

paselem commented 6 years ago

@brnleehng - is the AAD branch stable enough for @nkaul to start testing on?

brnleehng commented 6 years ago

Hi @nkaul

The AAD branch is stable enough for usage. To install these branches, here's the command below.

devtools::install_github("Azure/rAzureBatch", ref = "feature/aad")
devtools::install_github("Azure/doAzureParallel", ref = "feature/aad")

Credentials File: There are multiple different changes in the credentials file for AAD. Provide a tenantId, clientId, and credentials of your AAD You will also need to supply both arm resource ids for the batch account and storage account.

{
  "servicePrincipal": {
    "tenantId": "Your tenant id",
    "clientId": "Your client id",
    "credential": "credential for AAD",
    "batchAccountResourceId": "/subscriptions/SUBSCRIPTION_ID/resourceGroups/RESOURCE_GROUP_ID/providers/Microsoft.Batch/batchAccounts/BATCH_ACCOUNT_NAME",
    "storageAccountResourceId": "/subscriptions/SUBSCRIPTION_ID/resourceGroups/RESOURCE_GROUP_ID/providers/Microsoft.Storage/storageAccounts/STORAGE_ACCOUNT_NAME"
  },
  "githubAuthenticationToken": "",
  "dockerAuthentication": {
    "username": "",
    "password": "",
    "registry": ""
  }
}

Cluster Configuration File: To enable virtual networks on the batch pool, provide a resource id to the subnetId property in the cluster configuration file.

{
  "name": "vnet",
  "vmSize": "Standard_D2_v2",
  "maxTasksPerNode": 1,
  "poolSize": {
    "dedicatedNodes": {
      "min": 2,
      "max": 2
    },
    "lowPriorityNodes": {
      "min": 0,
      "max": 0
    },
    "autoscaleFormula": "QUEUE"
  },
  "containerImage": "rocker/tidyverse:latest",
  "rPackages": {
    "cran": [],
    "github": [],
    "bioconductor": []
  },
  "commandLine": [],
  "subnetId": "/subscriptions/****************/resourceGroups/*****/providers/Microsoft.Network/virtualNetworks/dazp/subnets/*********"
}

Let me know if you are running into any issues

Thanks! Brian

brnleehng commented 6 years ago

Hi @nkaul,

I was wondering if you are running into any issues. We are planning on merging this branch as soon as we are done testing.

Thanks! Brian

brnleehng commented 6 years ago

Closing issue: VNet and AAD branch has been merged #252