Azure / doAzureParallel

A R package that allows users to submit parallel workloads in Azure
MIT License
107 stars 51 forks source link

Git PAT token not used when installing packages? #359

Open p-smirnov opened 5 years ago

p-smirnov commented 5 years ago

I am experiencing the known issue with autoscale and github package installation, where the error message is:

Error: HTTP error 403.
  API rate limit exceeded for 52.*******. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

  Rate limit remaining: 0/60
  Rate limit reset at: 2019-06-24 22:49:36 UTC

  To increase your GitHub API rate limit
  - Use `usethis::browse_github_pat()` to create a Personal Access Token.
  - Use `usethis::edit_r_environ()` and add the token as `GITHUB_PAT`.
Execution halted

However, I have set the githubAuthenticationToken in the credentials.json file. Is the environmental variable not yet set when the github install occurs with the packages are specified in the cluster.json file?

Possibly relevant: I am using a custom docker image (but I want to install the packages from git as I am iterating on package implementation).

I am not sure how to make a reproducible example, but it occurs when scaling up from 1 to ~400 nodes. Here is my cluster.json in case it helps to reproduce:

  "name": "psmirnov",
  "vmSize": "Standard_D2_v3",
  "maxTasksPerNode": 4,
  "poolSize": {
    "dedicatedNodes": {
      "min": 1,
      "max": 1
    },
    "lowPriorityNodes": {
      "min": 0,
      "max": 5000
    },
    "autoscaleFormula": "QUEUE"
  },
  "containerImage": "bhklab/pharmacogx:v3",
  "rPackages": {
    "cran": ["MASS", "tictoc", "mvtnorm", "abind", "polynom", "memoise", "purrr", "matrixStats"],
    "github": ["bhklab/mCI", "bhklab/fastCI"],
    "bioconductor": []
  },
  "commandLine": [],
  "subnetId": ""
}
minister3000 commented 5 years ago

I experience a similar behavior when not using a docker image. It appears that the github 'Personal Access Token' (PAT) is completely ignored even though it is set up correctly in the credentials file. Therefore I am not able to scale the project up without running into the 'API rate limit exceeded' issue described by p-smirnov above. I confirmed my suspicion that the PAT entry in the credentials file is ignored by setting my github repository to 'private', after which the repo can no longer be installed on the Azure nodes even though the personal access token should allow precisely this. Any help on this issue is appreciated...

brnleehng commented 5 years ago

@p-smirnov @minister3000 I'm taking a look at this

brnleehng commented 5 years ago

When we migrated to docker containers, it looks like the PAT environment variable is not being passed to the container. Since we use the R in the container image, the container requires the environment variable to exist.

https://github.com/Azure/doAzureParallel/blob/master/R/utility-commands.R#L100-L138

minister3000 commented 5 years ago

Thanks for looking into this. I should have been more specific: I am not using a custom docker image but 'rocker/tidyverse:lastest'. If I read your answer correctly the PAT variable is not passed to this container either? Is there another way to set the required environment variable, maybe through the cluster.json file?

brnleehng commented 5 years ago

Yes that is correct. The PAT variable is not being passed through container either. I will add a fix for adding the PAT variable to the current environment variables.

I will discuss with others on possibility on environment variables on cluster file.

minister3000 commented 5 years ago

Thank you for confirming the issue and working on it. I assume private Github repositories can not be installed until this is fixed, and the maximum number of nodes is limited to 40 when using public repositories. (Github allows 60 unauthenticated requests per hour and I reach the limit with 40 nodes for whatever reason). Is there an estimated timeline to get the fix in place?

brnleehng commented 5 years ago

I have a working fix branch that you can use. My plan is to merge it on Monday to do further testing.

devtools::install_github("Azure/doAzureParallel", ref="fix/github-pat-token")
Solfood commented 5 years ago

Another issue being seen with this. Fetching private repository is working but package build is returning a node failure error.

─ building ‘demoRcpp_1.0.tar.gz’

g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I"/mnt/batch/tasks/shared/R/packages/Rcpp/include" -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c RcppExports.cpp -o RcppExports.o g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I"/mnt/batch/tasks/shared/R/packages/Rcpp/include" -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c rcpp_hello_world.cpp -o rcpp_hello_world.o g++ -std=gnu++11 -I"/usr/local/lib/R/include" -DNDEBUG -I"/mnt/batch/tasks/shared/R/packages/Rcpp/include" -I/usr/local/include -fpic -g -O2 -fstack-protector-strong -Wformat -Werror=format-security -Wdate-time -D_FORTIFY_SOURCE=2 -g -c script.cpp -o script.o g++ -std=gnu++11 -shared -L/usr/local/lib/R/lib -L/usr/local/lib -o demoRcpp.so RcppExports.o rcpp_hello_world.o script.o -L/usr/local/lib/R/lib -lR Error getting parent environment: there is no package called ‘BiocInstaller’

minister3000 commented 5 years ago

I can confirm that the fix you provided is working and that the PAT is being passed to, and accepted by GitHub. I no longer hit GitHub's 60 unauthenticated requests threshold and am able to fetch from private repositories and install and run packages that rely on Rcpp. Thank you very much for providing a solution to this problem.

p-smirnov commented 5 years ago

@brnleehng Thank you very much for the fix!

englianhu commented 3 years ago

I am experiencing the known issue with autoscale and github package installation, where the error message is:

Error: HTTP error 403.
  API rate limit exceeded for 52.*******. (But here's the good news: Authenticated requests get a higher rate limit. Check out the documentation for more details.)

  Rate limit remaining: 0/60
  Rate limit reset at: 2019-06-24 22:49:36 UTC

  To increase your GitHub API rate limit
  - Use `usethis::browse_github_pat()` to create a Personal Access Token.
  - Use `usethis::edit_r_environ()` and add the token as `GITHUB_PAT`.
Execution halted

However, I have set the githubAuthenticationToken in the credentials.json file. Is the environmental variable not yet set when the github install occurs with the packages are specified in the cluster.json file?

Possibly relevant: I am using a custom docker image (but I want to install the packages from git as I am iterating on package implementation).

I am not sure how to make a reproducible example, but it occurs when scaling up from 1 to ~400 nodes. Here is my cluster.json in case it helps to reproduce:

  "name": "psmirnov",
  "vmSize": "Standard_D2_v3",
  "maxTasksPerNode": 4,
  "poolSize": {
    "dedicatedNodes": {
      "min": 1,
      "max": 1
    },
    "lowPriorityNodes": {
      "min": 0,
      "max": 5000
    },
    "autoscaleFormula": "QUEUE"
  },
  "containerImage": "bhklab/pharmacogx:v3",
  "rPackages": {
    "cran": ["MASS", "tictoc", "mvtnorm", "abind", "polynom", "memoise", "purrr", "matrixStats"],
    "github": ["bhklab/mCI", "bhklab/fastCI"],
    "bioconductor": []
  },
  "commandLine": [],
  "subnetId": ""
}

refer to https://gist.github.com/Z3tt/3dab3535007acf108391649766409421#gistcomment-3746021, simple and awesome !