gittrends-app / github-proxy-server

A tool for massive data collection from GitHub APIs (Rest and GraphQL)
MIT License
17 stars 4 forks source link

PyGithub Integration cannot fetch Repositories' paginated fields from proxy #18

Open victorgveloso opened 1 year ago

victorgveloso commented 1 year ago


I am assuming that PyGithub integration is intended given that you provided a sample code for it.

The sample code works because it only calls the MainClass.get_repo method and reads the Repository's non-paginated fields.


Even though the proxy tries to handle pagination, it does so relying on the Reponse's header fields. However, PyGitHub doesn't rely on the header cursors of the GitHub v3 API's response. Instead, it uses the Repository attribute url to create the PaginatedList. Then, the PaginatedList defines self.__nextUrl using the provided URL.

Expected behavior

PyGithub should behave similarly whether I set the baseUrl to localhost:3000 or not. So, the code below should list the pull requests of the "hsborges/github-proxy-server" repository.

gh = Github(base_url="http://localhost:3000")
r = gh.get_repo("hsborges/github-proxy-server")
for pr in r.get_pulls():

Furthermore, the code below should list the labels of the same repository

for label in r.get_labels():

Current behavior

Both code examples raise an AssertionError (see the full stack trace below).

AssertionError                            Traceback (most recent call last)
Cell In [112], line 1
----> 1 r.get_labels()[0]

File ~/.local/lib/python3.10/site-packages/github/, in PaginatedListBase.__getitem__(self, index)
     46 assert isinstance(index, (int, slice))
     47 if isinstance(index, int):
---> 48     self.__fetchToIndex(index)
     49     return self.__elements[index]
     50 else:

File ~/.local/lib/python3.10/site-packages/github/, in PaginatedListBase.__fetchToIndex(self, index)
     62 def __fetchToIndex(self, index):
     63     while len(self.__elements) <= index and self._couldGrow():
---> 64         self._grow()

File ~/.local/lib/python3.10/site-packages/github/, in PaginatedListBase._grow(self)
     66 def _grow(self):
---> 67     newElements = self._fetchNextPage()
     68     self.__elements += newElements
     69     return newElements

File ~/.local/lib/python3.10/site-packages/github/, in PaginatedList._fetchNextPage(self)
    200 def _fetchNextPage(self):
--> 201     headers, data = self.__requester.requestJsonAndCheck(
    202         "GET", self.__nextUrl, parameters=self.__nextParams, headers=self.__headers
    203     )
    204     data = data if data else []
    206     self.__nextUrl = None

File ~/.local/lib/python3.10/site-packages/github/, in Requester.requestJsonAndCheck(self, verb, url, parameters, headers, input)
    352 def requestJsonAndCheck(self, verb, url, parameters=None, headers=None, input=None):
    353     return self.__check(
--> 354         *self.requestJson(
    355             verb, url, parameters, headers, input, self.__customConnection(url)
    356         )
    357     )

File ~/.local/lib/python3.10/site-packages/github/, in Requester.requestJson(self, verb, url, parameters, headers, input, cnx)
    451 def encode(input):
    452     return "application/json", json.dumps(input)
--> 454 return self.__requestEncode(cnx, verb, url, parameters, headers, input, encode)

File ~/.local/lib/python3.10/site-packages/github/, in Requester.__requestEncode(self, cnx, verb, url, parameters, requestHeaders, input, encode)
    516 self.__authenticate(url, requestHeaders, parameters)
    517 requestHeaders["User-Agent"] = self.__userAgent
--> 519 url = self.__makeAbsoluteUrl(url)
    520 url = self.__addParametersToUrl(url, parameters)
    522 encoded_input = None

File ~/.local/lib/python3.10/site-packages/github/, in Requester.__makeAbsoluteUrl(self, url)
    589 else:
    590     o = urllib.parse.urlparse(url)
--> 591     assert o.hostname in [
    592         self.__hostname,
    593         "",
    594         "",
    595         "",
    596     ], o.hostname
    597     assert o.path.startswith((self.__prefix, "/api/"))
    598     assert o.port == self.__port



I believe PyGithub's AssertionError is not the problem, it only highlights that the called hostname ( differs from the expected hostname (localhost:3000), which is set in the MainClass constructor (aka. Github class).

Where does PyGithub take that hostname from?

When PyGithub's Repository object is built, it leverages the API Response data (JSON) to feed its attributes (see where self._useAttributes is called and where it is declared).

Suggested solution

The data fields (archive_url, assignees_url, blobs_url, branches_url, clone_url, collaborators_url, comments_url, commits_url, compare_url, contents_url, contributors_url, deployments_url, downloads_url, events_url, forks_url, git_commits_url, git_refs_url, git_tags_url, git_url, hooks_url, html_url, issue_comment_url, issue_events_url, issues_url, keys_url, labels_url, languages_url, merges_url, milestones_url, mirror_url, notifications_url, pulls_url, releases_url, ssh_url, stargazers_url, statuses_url, subscribers_url, subscription_url, svn_url, tags_url, teams_url, trees_url, url) should have their occurrences replaced with http://localhost:3000 by github-proxy-server too.


The following code should work:

gh = Github(base_url="http://localhost:3000")
r = gh.get_repo("hsborges/github-proxy-server")
proxy_url = r.url.replace("","http://localhost:3000")
for pr in r.get_pulls():
for label in r.get_labels():

However, replacing the cursors' hostname is github-proxy-server's responsibility.

victorgveloso commented 1 year ago

Example of Response for get_pulls using github-proxy-server:

{'_links': {'comments': {'href': ''},
             'commits': {'href': ''},
             'html': {'href': ''},
             'issue': {'href': ''},
             'review_comment': {'href': '{/number}'},
             'review_comments': {'href': ''},
             'self': {'href': ''},
             'statuses': {'href': ''}},
  'active_lock_reason': None,
  'assignee': None,
  'assignees': [],
  'author_association': 'NONE',
  'auto_merge': None,
  'base': {'label': 'django:main',
           'ref': 'main',
           'repo': {'allow_forking': True,
                    'archive_url': '{archive_format}{/ref}',
                    'archived': False,
                    'assignees_url': '{/user}',
                    'blobs_url': '{/sha}',
                    'branches_url': '{/branch}',
                    'clone_url': '',
                    'collaborators_url': '{/collaborator}',
                    'comments_url': '{/number}',
                    'commits_url': '{/sha}',
                    'compare_url': '{base}...{head}',
                    'contents_url': '{+path}',
                    'contributors_url': '',
                    'created_at': '2012-04-28T02:47:18Z',
                    'default_branch': 'main',
                    'deployments_url': '',
                    'description': 'The Web framework for perfectionists with '
                    'disabled': False,
                    'downloads_url': '',
                    'events_url': '',
                    'fork': False,
                    'forks': 28138,
                    'forks_count': 28138,
                    'forks_url': '',
                    'full_name': 'django/django',
                    'git_commits_url': '{/sha}',
                    'git_refs_url': '{/sha}',
                    'git_tags_url': '{/sha}',
                    'git_url': 'git://',
                    'has_downloads': True,
                    'has_issues': False,
                    'has_pages': False,
                    'has_projects': False,
                    'has_wiki': False,
                    'homepage': '',
                    'hooks_url': '',
                    'html_url': '',
                    'id': 4164482,
                    'is_template': False,
                    'issue_comment_url': '{/number}',
                    'issue_events_url': '{/number}',
                    'issues_url': '{/number}',
                    'keys_url': '{/key_id}',
                    'labels_url': '{/name}',
                    'language': 'Python',
                    'languages_url': '',
                    'license': {'key': 'bsd-3-clause',
                                'name': 'BSD 3-Clause "New" or "Revised" '
                                'node_id': 'MDc6TGljZW5zZTU=',
                                'spdx_id': 'BSD-3-Clause',
                                'url': ''},
                    'merges_url': '',
                    'milestones_url': '{/number}',
                    'mirror_url': None,
                    'name': 'django',
                    'node_id': 'MDEwOlJlcG9zaXRvcnk0MTY0NDgy',
                    'notifications_url': '{?since,all,participating}',
                    'open_issues': 182,
                    'open_issues_count': 182,
                    'owner': {'avatar_url': '',
                              'events_url': '{/privacy}',
                              'followers_url': '',
                              'following_url': '{/other_user}',
                              'gists_url': '{/gist_id}',
                              'gravatar_id': '',
                              'html_url': '',
                              'id': 27804,
                              'login': 'django',
                              'node_id': 'MDEyOk9yZ2FuaXphdGlvbjI3ODA0',
                              'organizations_url': '',
                              'received_events_url': '',
                              'repos_url': '',
                              'site_admin': False,
                              'starred_url': '{/owner}{/repo}',
                              'subscriptions_url': '',
                              'type': 'Organization',
                              'url': ''},
                    'private': False,
                    'pulls_url': '{/number}',
                    'pushed_at': '2022-11-07T03:45:35Z',
                    'releases_url': '{/id}',
                    'size': 230261,
                    'ssh_url': '',
                    'stargazers_count': 67161,
                    'stargazers_url': '',
                    'statuses_url': '{sha}',
                    'subscribers_url': '',
                    'subscription_url': '',
                    'svn_url': '',
                    'tags_url': '',
                    'teams_url': '',
                    'topics': ['apps',
                    'trees_url': '{/sha}',
                    'updated_at': '2022-11-07T04:38:28Z',
                    'url': '',
                    'visibility': 'public',
                    'watchers': 67161,
                    'watchers_count': 67161,
                    'web_commit_signoff_required': False},
           'sha': 'eb6cc01d0f62c73441a3610193ba210176d0935f',
           'user': {'avatar_url': '',
                    'events_url': '{/privacy}',
                    'followers_url': '',
                    'following_url': '{/other_user}',
                    'gists_url': '{/gist_id}',
                    'gravatar_id': '',
                    'html_url': '',
                    'id': 27804,
                    'login': 'django',
                    'node_id': 'MDEyOk9yZ2FuaXphdGlvbjI3ODA0',
                    'organizations_url': '',
                    'received_events_url': '',
                    'repos_url': '',
                    'site_admin': False,
                    'starred_url': '{/owner}{/repo}',
                    'subscriptions_url': '',
                    'type': 'Organization',
                    'url': ''}},
  'body': '\r\n'
          "Ensure querystring persists when 'save and add another' is clicked. "
          'Add a test case.\r\n'
          'Co-authored-by: Grady Yu <>\r\n',
  'closed_at': None,
  'comments_url': '',
  'commits_url': '',
  'created_at': '2022-10-20T21:42:05Z',
  'diff_url': '',
  'draft': False,
  'head': {'label': 'matthewn:ticket_12241',
           'ref': 'ticket_12241',
           'repo': {'allow_forking': True,
                    'archive_url': '{archive_format}{/ref}',
                    'archived': False,
                    'assignees_url': '{/user}',
                    'blobs_url': '{/sha}',
                    'branches_url': '{/branch}',
                    'clone_url': '',
                    'collaborators_url': '{/collaborator}',
                    'comments_url': '{/number}',
                    'commits_url': '{/sha}',
                    'compare_url': '{base}...{head}',
                    'contents_url': '{+path}',
                    'contributors_url': '',
                    'created_at': '2022-10-20T16:15:41Z',
                    'default_branch': 'main',
                    'deployments_url': '',
                    'description': 'The Web framework for perfectionists with '
                    'disabled': False,
                    'downloads_url': '',
                    'events_url': '',
                    'fork': True,
                    'forks': 0,
                    'forks_count': 0,
                    'forks_url': '',
                    'full_name': 'matthewn/django',
                    'git_commits_url': '{/sha}',
                    'git_refs_url': '{/sha}',
                    'git_tags_url': '{/sha}',
                    'git_url': 'git://',
                    'has_downloads': True,
                    'has_issues': False,
                    'has_pages': False,
                    'has_projects': True,
                    'has_wiki': False,
                    'homepage': '',
                    'hooks_url': '',
                    'html_url': '',
                    'id': 554918143,
                    'is_template': False,
                    'issue_comment_url': '{/number}',
                    'issue_events_url': '{/number}',
                    'issues_url': '{/number}',
                    'keys_url': '{/key_id}',
                    'labels_url': '{/name}',
                    'language': None,
                    'languages_url': '',
                    'license': {'key': 'bsd-3-clause',
                                'name': 'BSD 3-Clause "New" or "Revised" '
                                'node_id': 'MDc6TGljZW5zZTU=',
                                'spdx_id': 'BSD-3-Clause',
                                'url': ''},
                    'merges_url': '',
                    'milestones_url': '{/number}',
                    'mirror_url': None,
                    'name': 'django',
                    'node_id': 'R_kgDOIRNg_w',
                    'notifications_url': '{?since,all,participating}',
                    'open_issues': 0,
                    'open_issues_count': 0,
                    'owner': {'avatar_url': '',
                              'events_url': '{/privacy}',
                              'followers_url': '',
                              'following_url': '{/other_user}',
                              'gists_url': '{/gist_id}',
                              'gravatar_id': '',
                              'html_url': '',
                              'id': 782716,
                              'login': 'matthewn',
                              'node_id': 'MDQ6VXNlcjc4MjcxNg==',
                              'organizations_url': '',
                              'received_events_url': '',
                              'repos_url': '',
                              'site_admin': False,
                              'starred_url': '{/owner}{/repo}',
                              'subscriptions_url': '',
                              'type': 'User',
                              'url': ''},
                    'private': False,
                    'pulls_url': '{/number}',
                    'pushed_at': '2022-10-26T23:03:36Z',
                    'releases_url': '{/id}',
                    'size': 187498,
                    'ssh_url': '',
                    'stargazers_count': 0,
                    'stargazers_url': '',
                    'statuses_url': '{sha}',
                    'subscribers_url': '',
                    'subscription_url': '',
                    'svn_url': '',
                    'tags_url': '',
                    'teams_url': '',
                    'topics': [],
                    'trees_url': '{/sha}',
                    'updated_at': '2022-10-20T15:32:53Z',
                    'url': '',
                    'visibility': 'public',
                    'watchers': 0,
                    'watchers_count': 0,
                    'web_commit_signoff_required': False},
           'sha': '660580daa3d7ee49b351b47d5a55e78b3ef77065',
           'user': {'avatar_url': '',
                    'events_url': '{/privacy}',
                    'followers_url': '',
                    'following_url': '{/other_user}',
                    'gists_url': '{/gist_id}',
                    'gravatar_id': '',
                    'html_url': '',
                    'id': 782716,
                    'login': 'matthewn',
                    'node_id': 'MDQ6VXNlcjc4MjcxNg==',
                    'organizations_url': '',
                    'received_events_url': '',
                    'repos_url': '',
                    'site_admin': False,
                    'starred_url': '{/owner}{/repo}',
                    'subscriptions_url': '',
                    'type': 'User',
                    'url': ''}},
  'html_url': '',
  'id': 1094441915,
  'issue_url': '',
  'labels': [{'color': 'D4C5F9',
              'default': False,
              'description': '',
              'id': 4700196009,
              'name': 'DjangoCon 🦄',
              'node_id': 'LA_kwDOAD-Lgs8AAAABGCdMqQ',
              'url': ''}],
  'locked': False,
  'merge_commit_sha': 'd78dc77a0b3974662be72d79966d830326889d14',
  'merged_at': None,
  'milestone': None,
  'node_id': 'PR_kwDOAD-Lgs5BO9u7',
  'number': 16206,
  'patch_url': '',
  'requested_reviewers': [],
  'requested_teams': [],
  'review_comment_url': '{/number}',
  'review_comments_url': '',
  'state': 'open',
  'statuses_url': '',
  'title': 'Fixed #12241 Admin forgets URL used for prefilling forms when '
           'hitting Save and add another',
  'updated_at': '2022-10-27T11:35:07Z',
  'url': '',
  'user': {'avatar_url': '',
           'events_url': '{/privacy}',
           'followers_url': '',
           'following_url': '{/other_user}',
           'gists_url': '{/gist_id}',
           'gravatar_id': '',
           'html_url': '',
           'id': 782716,
           'login': 'matthewn',
           'node_id': 'MDQ6VXNlcjc4MjcxNg==',
           'organizations_url': '',
           'received_events_url': '',
           'repos_url': '',
           'site_admin': False,
           'starred_url': '{/owner}{/repo}',
           'subscriptions_url': '',
           'type': 'User',
           'url': ''}},

Code to reproduce:

import requests
from pprint import pprint