githubexporter / github-exporter

:octocat: Prometheus exporter for github metrics
MIT License
425 stars 121 forks source link

Incorrect `github_repo_pull_request_count` when more than one page of pull requests exists #107

Open steinbrueckri opened 8 months ago

steinbrueckri commented 8 months ago

Description

I've discovered an issue with how github_repo_pull_request_count is calculated in the exporter. It seems that the count only reflects the last page of pull requests when multiple pages are present due to GitHub's API pagination. As a result:

This discrepancy leads to inaccurate metrics for repositories with a high number of pull requests.

Steps to Reproduce

  1. Use the exporter on a repository with fewer than 30 open pull requests and note the correct count.
  2. Use the exporter on a repository with more than 30 open pull requests (e.g., 36).
  3. Observe that github_repo_pull_request_count only reflects the number of pull requests on the last page (e.g., 6 instead of 36).

Expected Behavior

github_repo_pull_request_count should accurately report the total number of open pull requests in a repository, regardless of how many pages of pull requests there are.

Actual Behavior

github_repo_pull_request_count only reports the number of pull requests on the last page due to not accounting for GitHub's API pagination.

Possible Solution / Workaround

I've applied a temporary workaround that modifies the request to fetch pull requests with a higher per_page limit, aiming to reduce the impact of pagination. However, this is not a scalable solution, especially for repositories with pull requests exceeding the per_page limit. Here's the patch:

diff --git a/exporter/gather.go b/exporter/gather.go
index 7a72db2..de73e5c 100644
--- a/exporter/gather.go
+++ b/exporter/gather.go
@@ -111,7 +111,7 @@ func getReleases(e *Exporter, url string, data *[]Release) {
 func getPRs(e *Exporter, url string, data *[]Pull) {
  i := strings.Index(url, "?")
  baseURL := url[:i]
- pullsURL := baseURL + "/pulls"
+ pullsURL := baseURL + "/pulls?per_page=100"
  pullsResponse, err := asyncHTTPGets([]string{pullsURL}, e.APIToken())

  if err != nil {

This workaround increases the per_page parameter to 100, but a more robust solution that iterates through all pages to calculate the total count is needed.

Additional Context

Another thing is that the test for checking github_repo_pull_request_count is also not working i guess because if dont make a real http request.

steinbrueckri commented 6 months ago

@henrymcconville Can you maybe take a look on this issue?

steinbrueckri commented 5 months ago

bump 😬

henrymcconville commented 5 months ago

Hey @steinbrueckri, I don't have much capacity for the next few weeks to implement a proper fix but if you want to raise a PR to increase the default perpage limit, i'll happily approve. Thanks!

steinbrueckri commented 5 months ago

@henrymcconville First of all, thank you for your feedback. I really appreciate the work you do on maintaining this project. For me, this issue is not urgent at the moment because I am currently using a version with the workaround I described in the PR body. Please address it whenever you have the time.

I understand that maintaining an open-source project can be very demanding, and I truly value the effort you put into it. However, I believe that spending quality time with your family and taking care of personal matters is far more important. Your well-being and family should always come first.

I tried to fix the issue myself, but my Go skills are not sufficient. Therefore, I'm glad that you have this on your radar and are willing to look into it when you can. Thanks again for all your hard work. 🤗