gittrends-app / github-proxy-server

A tool for massive data collection from GitHub APIs (Rest and GraphQL)
MIT License
17 stars 4 forks source link

PyGithub Integration cannot fetch Repositories' paginated fields from proxy #18

Open victorgveloso opened 1 year ago

victorgveloso commented 1 year ago

Background

I am assuming that PyGithub integration is intended given that you provided a sample code for it.

The sample code works because it only calls the MainClass.get_repo method and reads the Repository's non-paginated fields.

Context

Even though the proxy tries to handle pagination, it does so relying on the Reponse's header fields. However, PyGitHub doesn't rely on the header cursors of the GitHub v3 API's response. Instead, it uses the Repository attribute url to create the PaginatedList. Then, the PaginatedList defines self.__nextUrl using the provided URL.

Expected behavior

PyGithub should behave similarly whether I set the baseUrl to localhost:3000 or not. So, the code below should list the pull requests of the "hsborges/github-proxy-server" repository.

gh = Github(base_url="http://localhost:3000")
r = gh.get_repo("hsborges/github-proxy-server")
for pr in r.get_pulls():
    print(pr)

Furthermore, the code below should list the labels of the same repository

for label in r.get_labels():
    print(label)

Current behavior

Both code examples raise an AssertionError (see the full stack trace below).

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In [112], line 1
----> 1 r.get_labels()[0]

File ~/.local/lib/python3.10/site-packages/github/PaginatedList.py:48, in PaginatedListBase.__getitem__(self, index)
     46 assert isinstance(index, (int, slice))
     47 if isinstance(index, int):
---> 48     self.__fetchToIndex(index)
     49     return self.__elements[index]
     50 else:

File ~/.local/lib/python3.10/site-packages/github/PaginatedList.py:64, in PaginatedListBase.__fetchToIndex(self, index)
     62 def __fetchToIndex(self, index):
     63     while len(self.__elements) <= index and self._couldGrow():
---> 64         self._grow()

File ~/.local/lib/python3.10/site-packages/github/PaginatedList.py:67, in PaginatedListBase._grow(self)
     66 def _grow(self):
---> 67     newElements = self._fetchNextPage()
     68     self.__elements += newElements
     69     return newElements

File ~/.local/lib/python3.10/site-packages/github/PaginatedList.py:201, in PaginatedList._fetchNextPage(self)
    200 def _fetchNextPage(self):
--> 201     headers, data = self.__requester.requestJsonAndCheck(
    202         "GET", self.__nextUrl, parameters=self.__nextParams, headers=self.__headers
    203     )
    204     data = data if data else []
    206     self.__nextUrl = None

File ~/.local/lib/python3.10/site-packages/github/Requester.py:354, in Requester.requestJsonAndCheck(self, verb, url, parameters, headers, input)
    352 def requestJsonAndCheck(self, verb, url, parameters=None, headers=None, input=None):
    353     return self.__check(
--> 354         *self.requestJson(
    355             verb, url, parameters, headers, input, self.__customConnection(url)
    356         )
    357     )

File ~/.local/lib/python3.10/site-packages/github/Requester.py:454, in Requester.requestJson(self, verb, url, parameters, headers, input, cnx)
    451 def encode(input):
    452     return "application/json", json.dumps(input)
--> 454 return self.__requestEncode(cnx, verb, url, parameters, headers, input, encode)

File ~/.local/lib/python3.10/site-packages/github/Requester.py:519, in Requester.__requestEncode(self, cnx, verb, url, parameters, requestHeaders, input, encode)
    516 self.__authenticate(url, requestHeaders, parameters)
    517 requestHeaders["User-Agent"] = self.__userAgent
--> 519 url = self.__makeAbsoluteUrl(url)
    520 url = self.__addParametersToUrl(url, parameters)
    522 encoded_input = None

File ~/.local/lib/python3.10/site-packages/github/Requester.py:591, in Requester.__makeAbsoluteUrl(self, url)
    589 else:
    590     o = urllib.parse.urlparse(url)
--> 591     assert o.hostname in [
    592         self.__hostname,
    593         "uploads.github.com",
    594         "status.github.com",
    595         "github.com",
    596     ], o.hostname
    597     assert o.path.startswith((self.__prefix, "/api/"))
    598     assert o.port == self.__port

AssertionError: api.github.com

Explanation

I believe PyGithub's AssertionError is not the problem, it only highlights that the called hostname (api.github.com) differs from the expected hostname (localhost:3000), which is set in the MainClass constructor (aka. Github class).

Where does PyGithub take that hostname from?

When PyGithub's Repository object is built, it leverages the API Response data (JSON) to feed its attributes (see where self._useAttributes is called and where it is declared).

Suggested solution

The data fields (archive_url, assignees_url, blobs_url, branches_url, clone_url, collaborators_url, comments_url, commits_url, compare_url, contents_url, contributors_url, deployments_url, downloads_url, events_url, forks_url, git_commits_url, git_refs_url, git_tags_url, git_url, hooks_url, html_url, issue_comment_url, issue_events_url, issues_url, keys_url, labels_url, languages_url, merges_url, milestones_url, mirror_url, notifications_url, pulls_url, releases_url, ssh_url, stargazers_url, statuses_url, subscribers_url, subscription_url, svn_url, tags_url, teams_url, trees_url, url) should have their https://api.github.com occurrences replaced with http://localhost:3000 by github-proxy-server too.

Demo

The following code should work:

gh = Github(base_url="http://localhost:3000")
r = gh.get_repo("hsborges/github-proxy-server")
proxy_url = r.url.replace("https://api.github.com","http://localhost:3000")
r._useAttributes({"url":proxy_url})
for pr in r.get_pulls():
    print(pr)
for label in r.get_labels():
    print(label)

However, replacing the cursors' hostname is github-proxy-server's responsibility.

victorgveloso commented 1 year ago

Example of Response for get_pulls using github-proxy-server:

[...,
{'_links': {'comments': {'href': 'https://api.github.com/repos/django/django/issues/16206/comments'},
             'commits': {'href': 'https://api.github.com/repos/django/django/pulls/16206/commits'},
             'html': {'href': 'https://github.com/django/django/pull/16206'},
             'issue': {'href': 'https://api.github.com/repos/django/django/issues/16206'},
             'review_comment': {'href': 'https://api.github.com/repos/django/django/pulls/comments{/number}'},
             'review_comments': {'href': 'https://api.github.com/repos/django/django/pulls/16206/comments'},
             'self': {'href': 'https://api.github.com/repos/django/django/pulls/16206'},
             'statuses': {'href': 'https://api.github.com/repos/django/django/statuses/660580daa3d7ee49b351b47d5a55e78b3ef77065'}},
  'active_lock_reason': None,
  'assignee': None,
  'assignees': [],
  'author_association': 'NONE',
  'auto_merge': None,
  'base': {'label': 'django:main',
           'ref': 'main',
           'repo': {'allow_forking': True,
                    'archive_url': 'https://api.github.com/repos/django/django/{archive_format}{/ref}',
                    'archived': False,
                    'assignees_url': 'https://api.github.com/repos/django/django/assignees{/user}',
                    'blobs_url': 'https://api.github.com/repos/django/django/git/blobs{/sha}',
                    'branches_url': 'https://api.github.com/repos/django/django/branches{/branch}',
                    'clone_url': 'https://github.com/django/django.git',
                    'collaborators_url': 'https://api.github.com/repos/django/django/collaborators{/collaborator}',
                    'comments_url': 'https://api.github.com/repos/django/django/comments{/number}',
                    'commits_url': 'https://api.github.com/repos/django/django/commits{/sha}',
                    'compare_url': 'https://api.github.com/repos/django/django/compare/{base}...{head}',
                    'contents_url': 'https://api.github.com/repos/django/django/contents/{+path}',
                    'contributors_url': 'https://api.github.com/repos/django/django/contributors',
                    'created_at': '2012-04-28T02:47:18Z',
                    'default_branch': 'main',
                    'deployments_url': 'https://api.github.com/repos/django/django/deployments',
                    'description': 'The Web framework for perfectionists with '
                                   'deadlines.',
                    'disabled': False,
                    'downloads_url': 'https://api.github.com/repos/django/django/downloads',
                    'events_url': 'https://api.github.com/repos/django/django/events',
                    'fork': False,
                    'forks': 28138,
                    'forks_count': 28138,
                    'forks_url': 'https://api.github.com/repos/django/django/forks',
                    'full_name': 'django/django',
                    'git_commits_url': 'https://api.github.com/repos/django/django/git/commits{/sha}',
                    'git_refs_url': 'https://api.github.com/repos/django/django/git/refs{/sha}',
                    'git_tags_url': 'https://api.github.com/repos/django/django/git/tags{/sha}',
                    'git_url': 'git://github.com/django/django.git',
                    'has_downloads': True,
                    'has_issues': False,
                    'has_pages': False,
                    'has_projects': False,
                    'has_wiki': False,
                    'homepage': 'https://www.djangoproject.com/',
                    'hooks_url': 'https://api.github.com/repos/django/django/hooks',
                    'html_url': 'https://github.com/django/django',
                    'id': 4164482,
                    'is_template': False,
                    'issue_comment_url': 'https://api.github.com/repos/django/django/issues/comments{/number}',
                    'issue_events_url': 'https://api.github.com/repos/django/django/issues/events{/number}',
                    'issues_url': 'https://api.github.com/repos/django/django/issues{/number}',
                    'keys_url': 'https://api.github.com/repos/django/django/keys{/key_id}',
                    'labels_url': 'https://api.github.com/repos/django/django/labels{/name}',
                    'language': 'Python',
                    'languages_url': 'https://api.github.com/repos/django/django/languages',
                    'license': {'key': 'bsd-3-clause',
                                'name': 'BSD 3-Clause "New" or "Revised" '
                                        'License',
                                'node_id': 'MDc6TGljZW5zZTU=',
                                'spdx_id': 'BSD-3-Clause',
                                'url': 'https://api.github.com/licenses/bsd-3-clause'},
                    'merges_url': 'https://api.github.com/repos/django/django/merges',
                    'milestones_url': 'https://api.github.com/repos/django/django/milestones{/number}',
                    'mirror_url': None,
                    'name': 'django',
                    'node_id': 'MDEwOlJlcG9zaXRvcnk0MTY0NDgy',
                    'notifications_url': 'https://api.github.com/repos/django/django/notifications{?since,all,participating}',
                    'open_issues': 182,
                    'open_issues_count': 182,
                    'owner': {'avatar_url': 'https://avatars.githubusercontent.com/u/27804?v=4',
                              'events_url': 'https://api.github.com/users/django/events{/privacy}',
                              'followers_url': 'https://api.github.com/users/django/followers',
                              'following_url': 'https://api.github.com/users/django/following{/other_user}',
                              'gists_url': 'https://api.github.com/users/django/gists{/gist_id}',
                              'gravatar_id': '',
                              'html_url': 'https://github.com/django',
                              'id': 27804,
                              'login': 'django',
                              'node_id': 'MDEyOk9yZ2FuaXphdGlvbjI3ODA0',
                              'organizations_url': 'https://api.github.com/users/django/orgs',
                              'received_events_url': 'https://api.github.com/users/django/received_events',
                              'repos_url': 'https://api.github.com/users/django/repos',
                              'site_admin': False,
                              'starred_url': 'https://api.github.com/users/django/starred{/owner}{/repo}',
                              'subscriptions_url': 'https://api.github.com/users/django/subscriptions',
                              'type': 'Organization',
                              'url': 'https://api.github.com/users/django'},
                    'private': False,
                    'pulls_url': 'https://api.github.com/repos/django/django/pulls{/number}',
                    'pushed_at': '2022-11-07T03:45:35Z',
                    'releases_url': 'https://api.github.com/repos/django/django/releases{/id}',
                    'size': 230261,
                    'ssh_url': 'git@github.com:django/django.git',
                    'stargazers_count': 67161,
                    'stargazers_url': 'https://api.github.com/repos/django/django/stargazers',
                    'statuses_url': 'https://api.github.com/repos/django/django/statuses/{sha}',
                    'subscribers_url': 'https://api.github.com/repos/django/django/subscribers',
                    'subscription_url': 'https://api.github.com/repos/django/django/subscription',
                    'svn_url': 'https://github.com/django/django',
                    'tags_url': 'https://api.github.com/repos/django/django/tags',
                    'teams_url': 'https://api.github.com/repos/django/django/teams',
                    'topics': ['apps',
                               'django',
                               'framework',
                               'models',
                               'orm',
                               'python',
                               'templates',
                               'views',
                               'web'],
                    'trees_url': 'https://api.github.com/repos/django/django/git/trees{/sha}',
                    'updated_at': '2022-11-07T04:38:28Z',
                    'url': 'https://api.github.com/repos/django/django',
                    'visibility': 'public',
                    'watchers': 67161,
                    'watchers_count': 67161,
                    'web_commit_signoff_required': False},
           'sha': 'eb6cc01d0f62c73441a3610193ba210176d0935f',
           'user': {'avatar_url': 'https://avatars.githubusercontent.com/u/27804?v=4',
                    'events_url': 'https://api.github.com/users/django/events{/privacy}',
                    'followers_url': 'https://api.github.com/users/django/followers',
                    'following_url': 'https://api.github.com/users/django/following{/other_user}',
                    'gists_url': 'https://api.github.com/users/django/gists{/gist_id}',
                    'gravatar_id': '',
                    'html_url': 'https://github.com/django',
                    'id': 27804,
                    'login': 'django',
                    'node_id': 'MDEyOk9yZ2FuaXphdGlvbjI3ODA0',
                    'organizations_url': 'https://api.github.com/users/django/orgs',
                    'received_events_url': 'https://api.github.com/users/django/received_events',
                    'repos_url': 'https://api.github.com/users/django/repos',
                    'site_admin': False,
                    'starred_url': 'https://api.github.com/users/django/starred{/owner}{/repo}',
                    'subscriptions_url': 'https://api.github.com/users/django/subscriptions',
                    'type': 'Organization',
                    'url': 'https://api.github.com/users/django'}},
  'body': 'https://code.djangoproject.com/ticket/12241\r\n'
          '\r\n'
          "Ensure querystring persists when 'save and add another' is clicked. "
          'Add a test case.\r\n'
          '\r\n'
          'Co-authored-by: Grady Yu <gradyy@users.noreply.github.com>\r\n',
  'closed_at': None,
  'comments_url': 'https://api.github.com/repos/django/django/issues/16206/comments',
  'commits_url': 'https://api.github.com/repos/django/django/pulls/16206/commits',
  'created_at': '2022-10-20T21:42:05Z',
  'diff_url': 'https://github.com/django/django/pull/16206.diff',
  'draft': False,
  'head': {'label': 'matthewn:ticket_12241',
           'ref': 'ticket_12241',
           'repo': {'allow_forking': True,
                    'archive_url': 'https://api.github.com/repos/matthewn/django/{archive_format}{/ref}',
                    'archived': False,
                    'assignees_url': 'https://api.github.com/repos/matthewn/django/assignees{/user}',
                    'blobs_url': 'https://api.github.com/repos/matthewn/django/git/blobs{/sha}',
                    'branches_url': 'https://api.github.com/repos/matthewn/django/branches{/branch}',
                    'clone_url': 'https://github.com/matthewn/django.git',
                    'collaborators_url': 'https://api.github.com/repos/matthewn/django/collaborators{/collaborator}',
                    'comments_url': 'https://api.github.com/repos/matthewn/django/comments{/number}',
                    'commits_url': 'https://api.github.com/repos/matthewn/django/commits{/sha}',
                    'compare_url': 'https://api.github.com/repos/matthewn/django/compare/{base}...{head}',
                    'contents_url': 'https://api.github.com/repos/matthewn/django/contents/{+path}',
                    'contributors_url': 'https://api.github.com/repos/matthewn/django/contributors',
                    'created_at': '2022-10-20T16:15:41Z',
                    'default_branch': 'main',
                    'deployments_url': 'https://api.github.com/repos/matthewn/django/deployments',
                    'description': 'The Web framework for perfectionists with '
                                   'deadlines.',
                    'disabled': False,
                    'downloads_url': 'https://api.github.com/repos/matthewn/django/downloads',
                    'events_url': 'https://api.github.com/repos/matthewn/django/events',
                    'fork': True,
                    'forks': 0,
                    'forks_count': 0,
                    'forks_url': 'https://api.github.com/repos/matthewn/django/forks',
                    'full_name': 'matthewn/django',
                    'git_commits_url': 'https://api.github.com/repos/matthewn/django/git/commits{/sha}',
                    'git_refs_url': 'https://api.github.com/repos/matthewn/django/git/refs{/sha}',
                    'git_tags_url': 'https://api.github.com/repos/matthewn/django/git/tags{/sha}',
                    'git_url': 'git://github.com/matthewn/django.git',
                    'has_downloads': True,
                    'has_issues': False,
                    'has_pages': False,
                    'has_projects': True,
                    'has_wiki': False,
                    'homepage': 'https://www.djangoproject.com/',
                    'hooks_url': 'https://api.github.com/repos/matthewn/django/hooks',
                    'html_url': 'https://github.com/matthewn/django',
                    'id': 554918143,
                    'is_template': False,
                    'issue_comment_url': 'https://api.github.com/repos/matthewn/django/issues/comments{/number}',
                    'issue_events_url': 'https://api.github.com/repos/matthewn/django/issues/events{/number}',
                    'issues_url': 'https://api.github.com/repos/matthewn/django/issues{/number}',
                    'keys_url': 'https://api.github.com/repos/matthewn/django/keys{/key_id}',
                    'labels_url': 'https://api.github.com/repos/matthewn/django/labels{/name}',
                    'language': None,
                    'languages_url': 'https://api.github.com/repos/matthewn/django/languages',
                    'license': {'key': 'bsd-3-clause',
                                'name': 'BSD 3-Clause "New" or "Revised" '
                                        'License',
                                'node_id': 'MDc6TGljZW5zZTU=',
                                'spdx_id': 'BSD-3-Clause',
                                'url': 'https://api.github.com/licenses/bsd-3-clause'},
                    'merges_url': 'https://api.github.com/repos/matthewn/django/merges',
                    'milestones_url': 'https://api.github.com/repos/matthewn/django/milestones{/number}',
                    'mirror_url': None,
                    'name': 'django',
                    'node_id': 'R_kgDOIRNg_w',
                    'notifications_url': 'https://api.github.com/repos/matthewn/django/notifications{?since,all,participating}',
                    'open_issues': 0,
                    'open_issues_count': 0,
                    'owner': {'avatar_url': 'https://avatars.githubusercontent.com/u/782716?v=4',
                              'events_url': 'https://api.github.com/users/matthewn/events{/privacy}',
                              'followers_url': 'https://api.github.com/users/matthewn/followers',
                              'following_url': 'https://api.github.com/users/matthewn/following{/other_user}',
                              'gists_url': 'https://api.github.com/users/matthewn/gists{/gist_id}',
                              'gravatar_id': '',
                              'html_url': 'https://github.com/matthewn',
                              'id': 782716,
                              'login': 'matthewn',
                              'node_id': 'MDQ6VXNlcjc4MjcxNg==',
                              'organizations_url': 'https://api.github.com/users/matthewn/orgs',
                              'received_events_url': 'https://api.github.com/users/matthewn/received_events',
                              'repos_url': 'https://api.github.com/users/matthewn/repos',
                              'site_admin': False,
                              'starred_url': 'https://api.github.com/users/matthewn/starred{/owner}{/repo}',
                              'subscriptions_url': 'https://api.github.com/users/matthewn/subscriptions',
                              'type': 'User',
                              'url': 'https://api.github.com/users/matthewn'},
                    'private': False,
                    'pulls_url': 'https://api.github.com/repos/matthewn/django/pulls{/number}',
                    'pushed_at': '2022-10-26T23:03:36Z',
                    'releases_url': 'https://api.github.com/repos/matthewn/django/releases{/id}',
                    'size': 187498,
                    'ssh_url': 'git@github.com:matthewn/django.git',
                    'stargazers_count': 0,
                    'stargazers_url': 'https://api.github.com/repos/matthewn/django/stargazers',
                    'statuses_url': 'https://api.github.com/repos/matthewn/django/statuses/{sha}',
                    'subscribers_url': 'https://api.github.com/repos/matthewn/django/subscribers',
                    'subscription_url': 'https://api.github.com/repos/matthewn/django/subscription',
                    'svn_url': 'https://github.com/matthewn/django',
                    'tags_url': 'https://api.github.com/repos/matthewn/django/tags',
                    'teams_url': 'https://api.github.com/repos/matthewn/django/teams',
                    'topics': [],
                    'trees_url': 'https://api.github.com/repos/matthewn/django/git/trees{/sha}',
                    'updated_at': '2022-10-20T15:32:53Z',
                    'url': 'https://api.github.com/repos/matthewn/django',
                    'visibility': 'public',
                    'watchers': 0,
                    'watchers_count': 0,
                    'web_commit_signoff_required': False},
           'sha': '660580daa3d7ee49b351b47d5a55e78b3ef77065',
           'user': {'avatar_url': 'https://avatars.githubusercontent.com/u/782716?v=4',
                    'events_url': 'https://api.github.com/users/matthewn/events{/privacy}',
                    'followers_url': 'https://api.github.com/users/matthewn/followers',
                    'following_url': 'https://api.github.com/users/matthewn/following{/other_user}',
                    'gists_url': 'https://api.github.com/users/matthewn/gists{/gist_id}',
                    'gravatar_id': '',
                    'html_url': 'https://github.com/matthewn',
                    'id': 782716,
                    'login': 'matthewn',
                    'node_id': 'MDQ6VXNlcjc4MjcxNg==',
                    'organizations_url': 'https://api.github.com/users/matthewn/orgs',
                    'received_events_url': 'https://api.github.com/users/matthewn/received_events',
                    'repos_url': 'https://api.github.com/users/matthewn/repos',
                    'site_admin': False,
                    'starred_url': 'https://api.github.com/users/matthewn/starred{/owner}{/repo}',
                    'subscriptions_url': 'https://api.github.com/users/matthewn/subscriptions',
                    'type': 'User',
                    'url': 'https://api.github.com/users/matthewn'}},
  'html_url': 'https://github.com/django/django/pull/16206',
  'id': 1094441915,
  'issue_url': 'https://api.github.com/repos/django/django/issues/16206',
  'labels': [{'color': 'D4C5F9',
              'default': False,
              'description': '',
              'id': 4700196009,
              'name': 'DjangoCon 🦄',
              'node_id': 'LA_kwDOAD-Lgs8AAAABGCdMqQ',
              'url': 'https://api.github.com/repos/django/django/labels/DjangoCon%20%F0%9F%A6%84'}],
  'locked': False,
  'merge_commit_sha': 'd78dc77a0b3974662be72d79966d830326889d14',
  'merged_at': None,
  'milestone': None,
  'node_id': 'PR_kwDOAD-Lgs5BO9u7',
  'number': 16206,
  'patch_url': 'https://github.com/django/django/pull/16206.patch',
  'requested_reviewers': [],
  'requested_teams': [],
  'review_comment_url': 'https://api.github.com/repos/django/django/pulls/comments{/number}',
  'review_comments_url': 'https://api.github.com/repos/django/django/pulls/16206/comments',
  'state': 'open',
  'statuses_url': 'https://api.github.com/repos/django/django/statuses/660580daa3d7ee49b351b47d5a55e78b3ef77065',
  'title': 'Fixed #12241 Admin forgets URL used for prefilling forms when '
           'hitting Save and add another',
  'updated_at': '2022-10-27T11:35:07Z',
  'url': 'https://api.github.com/repos/django/django/pulls/16206',
  'user': {'avatar_url': 'https://avatars.githubusercontent.com/u/782716?v=4',
           'events_url': 'https://api.github.com/users/matthewn/events{/privacy}',
           'followers_url': 'https://api.github.com/users/matthewn/followers',
           'following_url': 'https://api.github.com/users/matthewn/following{/other_user}',
           'gists_url': 'https://api.github.com/users/matthewn/gists{/gist_id}',
           'gravatar_id': '',
           'html_url': 'https://github.com/matthewn',
           'id': 782716,
           'login': 'matthewn',
           'node_id': 'MDQ6VXNlcjc4MjcxNg==',
           'organizations_url': 'https://api.github.com/users/matthewn/orgs',
           'received_events_url': 'https://api.github.com/users/matthewn/received_events',
           'repos_url': 'https://api.github.com/users/matthewn/repos',
           'site_admin': False,
           'starred_url': 'https://api.github.com/users/matthewn/starred{/owner}{/repo}',
           'subscriptions_url': 'https://api.github.com/users/matthewn/subscriptions',
           'type': 'User',
           'url': 'https://api.github.com/users/matthewn'}},
...]

Code to reproduce:

import requests
from pprint import pprint
pprint(requests.get("http://localhost:3000/repos/django/django/pulls").json())