Open orta opened 6 years ago
I think another improvement would be the GitHub utils caching results from the API. We run the no-console
and flow
plugins, and both of them use github.utils.fileContents
method to get the contents of all modified and created files—which duplicates the work at the moment.
I guess maybe we're already hitting the rate limits on that PR by even fetching them once?
Hrm, yeah, good point, I didn't consider that it might be coming from your plugins
Yeah, it's triggering ~100 x 2 simultaneous API requests to GH for file contents, so I'm not too surprised that it's hitting the rate limits
Maybe I should provide a rate limited fileContents API for that sort of thing in the DSL
It'd be neat if fileContents
just did that out of the box?
We might be able to just wrap it in p-limit
with a limit at something like 25 so it's not doing 200 calls at the same time. I'm not sure how GitHub rate limits work, but that might be a quick fix and we can always turn the knob further down to 10 or even only 1 at a time. (although that'd be much slower 😅)
As it return a promise, so that can be whatever under the hood, so having that be a queued makes sense.
Yeah, agreed, I'd recommend doing that on your plugins right now, and I'll integrate it into danger properly
I think it's probably actually easier to just do it in Danger, it should just be a matter of wrapping the fetch
call in this.api
in GitHubAPI.ts
with limit(() => fetch())
, right?
👉 #582
This is still happening on really big PRs with the flow plugin, IMO, the github.utils
needs a function specifically for handling a very large amount of file reads and running code against them
The reason it is still happening is because concurrency is not the issue, but the number of requests in a time interval. We should be reading the x-ratelimit-remaining
header from the Github responses to decide how many requests it is safe to make and x-ratelimit-reset
to figure out how long we need to wait until we can make the next request after the limit has been hit.
Here are the Github docs on rate limiting: https://developer.github.com/v3/#rate-limiting
The problem here is not the regular rate limits but GitHub’s abuse rate limits. Apart from the GitHub search API, the interval for x-ratelimit-reset
is once per hour, and it certainly won’t make sense to wait that long in the space of a PR. The abuse limits don’t provide headers, so probably it is a matter of turning down the concurrency further.
Honestly I’d suggest a concurrency limit of 1. That makes it more likely to work reliably, especially given the possibility of concurrent PRs. Or make it env configurable so developers can tune it themselves.
We had to remove danger from our pipeline because used like 1000 requests per run (we have very big pr's right now). Our limit of 10k request was used up within 5 - 10 minutes. Also had couple of abuse messages in the past.
Danger makes at least 12 requests to GitHub API by default. This action is running an empty dangerfile: https://github.com/gpressutto5/danger-api-limit-example/runs/6071421115?check_suite_focus=true
And, as we know:
When using GITHUB_TOKEN, the rate limit is 1,000 requests per hour per repository.
It is a problem on big monorepo projects because we usually have other steps using the same token, and 1,000 requests are too few when you have to use 12 just to get unused data from the API. Can we lazy load this info so it only makes the request if we need them?
@gpressutto5 it’s clear that there are a few duplicate requests notably /pulls/1
/user
so we could probably save about 5 api calls. Maybe those could be cached. Interested contributors might be able to do that.
Alternatively it’s possible we could rewrite things to use GraphQL and fold a bunch of requests into a single network request, but that feels like it could be a big change due to error handling.
We have the same problem with monorepo.
We need to validate commit messages + PR title.
I checked, dangerjs
performs 12 requests in total. For example /reviews
and /requested_reviewers
requests. But we don't need such requests in case when we only want to check PR title + commit messages. Is it possible to completely avoid these requests?
We grab enough info from github/bitbucket/gitlab/etc to make a full DSL which you can use synchronously in dangerfiles. I think it's still the right route to go - if you'd like to take a look at reducing the amount of requests mentioned above, give it a shot but I don't think we'd accept dangerfile breaking changes for this 👍🏻
@orta i understand. Right now seems https://github.com/danger/danger-js/pull/991 is the best way to solve this problem. Am i right?
Perhaps if you're doing a lot of manual file reads in your Dangerfile it could be, that's what the OP was doing
Our dangerfile.js
is very simple
const { schedule } = require('danger')
const { conventionalCommits } = require('./commits')
const { conventionalPRTitle } = require('./pr-title')
schedule(conventionalCommits)
schedule(conventionalPRTitle)
And each of these files just uses danger.github.pr.title
and danger.git.commits
to pass values to third-party validation packages. If validation failed, we use fail()
method from dangerjs
https://github.com/withspectrum/spectrum/issues/2632
Looks like when a PR gets big enough, danger and peril can hit the rate limits. Will need to slow down diff/comment pagination probably