change-metrics / monocle

Monocle helps teams and individual to better organize daily duties and to detect anomalies in the way changes are produced and reviewed.
https://changemetrics.io
GNU Affero General Public License v3.0
376 stars 58 forks source link

GitHub Enterprise: ratelimit is optional (can be unlimited) #958

Closed jumanjiman closed 2 years ago

jumanjiman commented 2 years ago

Thank you for this project!

I used the Docker image to test it against a private GHE (GitHub Enterprise) instance and discovered a bug and opportunity for enhancement.

Expected

https://github.com/change-metrics/monocle/blob/master/src/Lentille/GitHub/RateLimit.hs should handle the case where GHE admins have disabled rate limits on a GHE instance.

Actual

When GHE ratelimit is disabled (i.e., unlimited ratelimit), GitHub crawler treats null ratelimit as an error and therefore fails to crawl org data.

log output

This was the relevant line in docker-compose logs, and I inserted line breaks here for readability:

crawler_1  | 2022-09-28 14:11:30 WARNING Macroscope.Main:283: 
Skipping due to an unexpected exception 
{"index":"monocle-test","crawler":"GHE","err":"Invalid response: GetRateLimit {rateLimit = Nothing}
CallStack (from HasCallStack):
  error, called at src/Relude/Debug.hs:289:11 in relude-1.1.0.0-IXqsgVzL4PV6JByrFzskHM:Relude.Debug
  error, called at src/Lentille/GitHub/RateLimit.hs:43:16 in monocle-0.1.7.0-2VW3DhZvn5r33orcK7VHBU:Lentille.GitHub.RateLimit"}

config

---
# https://github.com/change-metrics/monocle#configuration-file
workspaces:
    - name: monocle-test
      crawlers:
          - name: GHE
            update_since: '2021-01-01'
            provider:
                github_url: https://ghe.example.com/api/graphql  # redacted
                github_organization: internalorg  # redacted

Reproducer

I wrote a trivial bash script to demonstrate actual responses from our GHE instance.

reproducer script output

Get ratelimit with REST API v3
{
  "message": "Rate limiting is not enabled.",
  "documentation_url": "https://docs.github.com/enterprise/3.6/rest/reference/rate-limit#get-rate-limit-status-for-the-authenticated-user"
}
HTTP response code 404

Get ratelimit with GraphQL API
{"data":{"rateLimit":null}}
HTTP response code 200

reproducer script

#!/usr/bin/env bash
set -eEu
set -o pipefail

# Use env var to hide corporate namespace.
GITHUB_BASE_URL="${GITHUB_BASE_URL:-https://ghe.example.com}"

echo 'Get ratelimit with REST API v3'
curl \
    -H 'Accept: application/vnd.github+json' \
    -H "Authorization: Bearer ${GITHUB_TOKEN}" \
    --write-out 'HTTP response code %{http_code}' \
    "${GITHUB_BASE_URL}/api/v3/rate_limit"

echo
echo

echo 'Get ratelimit with GraphQL API'
curl \
    -H 'Content-Type: application/json' \
    -H "Authorization: Bearer ${GITHUB_TOKEN}" \
    -X POST \
    -d '{"query": "query GetRateLimit  {rateLimit { used remaining resetAt } }"' \
     --write-out 'HTTP response code %{http_code}' \
   "${GITHUB_BASE_URL}/api/graphql"

GraphiQL screenshot

as a confirmation of the reproducer script and output:

image

Possible related issues

morucci commented 2 years ago

Hi,

Thanks for the feedback. #959 is an attempt to mitigate the issue you raised. However I don't have such Github appliance to validate it.

jumanjiman commented 2 years ago

However I don't have such Github appliance to validate it.

Thank you! i'm attempting to build and test locally, and commented on the PR, as well.

jumanjiman commented 2 years ago

959 appears to support GHE where GHE has unlimited ratelimit. Thank you again!

Running the local build exposes what I believe is a secondary bug. Should I open a separate issue for it?

Specifically, the last couple lines of crawler log output shown in https://github.com/change-metrics/monocle/pull/959#issuecomment-1262471178 include this:

crawler_1  | 2022-09-29 15:02:36 INFO    Macroscope.Worker:156: Processing {"index":"monocle-test","crawler":"GHE","stream":"TaskDatas","entity":{"contents":"internalorg/IMPORTANT_REPO","tag":"TaskDataEntity"},"age":"2021-01-01T00:00:00Z"}
crawler_1  | 2022-09-29 15:02:38 WARNING Macroscope.Main:300: Skipping due to an unexpected exception {"index":"monocle-test","crawler":"GHE","err":"Invalid response: GetLinkedIssues {rateLimit = Nothing, search = SearchSearchResultItemConnection {issueCount = 0, pageInfo = SearchPageInfoPageInfo {hasNextPage = False, endCursor = Nothing}, nodes = Just []}}\nCallStack (from HasCallStack):\n  error, called at src/Relude/Debug.hs:289:11 in relude-1.1.0.0-5JycKJkJeebDcuDsd66BY7:Relude.Debug\n  error, called at src/Lentille/GitHub/Issues.hs:116:18 in monocle-0.1.7.0-5C5VzWDrxK4JM5qtcn0kfv:Lentille.GitHub.Issues"}

For the repo internalorg/IMPORTANT_REPO (as well as others), the repo does not have GH issues. GH issues are disabled for the repo.

TristanCacqueray commented 2 years ago

@jumanjiman thank you for the feedback. Glad to hear you were able to test #959 . No need to open another issue, it's the same bug that needs to be fixed for the two other requests (GetProjects and GetIssues). I'll add a new commit for that on top of #959 shortly.

jumanjiman commented 2 years ago

@TristanCacqueray TY!