fastai / ghapi

A delightful and complete interface to GitHub's amazing API
https://ghapi.fast.ai/
Apache License 2.0
527 stars 57 forks source link

HTTP Error 422: Unprocessable Entity when using paged with client.search.code #96

Open dtaivpp opened 2 years ago

dtaivpp commented 2 years ago

Hey all I am getting the following when running code like the sample I have provided:

import os
from ghapi.all import GhApi
from ghapi.page import paged
from dotenv import load_dotenv
GH_TOKEN = os.getenv('GH_TOKEN', None)

def get_client() -> GhApi:
  """Return the GitHub Client"""
  return GhApi(token=GH_TOKEN)

def search_results(query: str, client: GhApi):
  """Yeilds results pages"""
  search_gen = paged(client.search.code, per_page=100, q=query)

  for results in search_gen:
    yield results

for result in search_results("filename:.dockercfg auth repo:dtaivpp/NewsTicker", get_client()):
    print(result)

Running the above brings me to the following:

Traceback (most recent call last): File "gh-dorker/dorker.py", line 83, in main(dorks=args.dorks, search=args.search, scope=args.scope) File "gh-dorker/dorker.py", line 43, in main for result in search_results(query, client): File "gh-dorker/dorker.py", line 24, in search_results for results in search_gen: File "/Users/dtippett/Desktop/Software_Projects/gh-dorker/venv/lib/python3.8/site-packages/ghapi/page.py", line 16, in paged yield from itertools.takewhile(noop, (oper(*args, per_page=per_page, page=i, *kwargs) for i in range(1,max_pages+1))) File "/Users/dtippett/Desktop/Software_Projects/gh-dorker/venv/lib/python3.8/site-packages/ghapi/page.py", line 16, in yield from itertools.takewhile(noop, (oper(args, per_page=per_page, page=i, **kwargs) for i in range(1,max_pages+1))) File "/Users/dtippett/Desktop/Software_Projects/gh-dorker/venv/lib/python3.8/site-packages/ghapi/core.py", line 63, in call return self.client(self.path, self.verb, headers=headers, route=route_p, query=query_p, data=data_p) File "/Users/dtippett/Desktop/Software_Projects/gh-dorker/venv/lib/python3.8/site-packages/ghapi/core.py", line 108, in call res,self.recv_hdrs = urlsend(path, verb, headers=headers or None, debug=self.debug, return_headers=True, File "/Users/dtippett/Desktop/Software_Projects/gh-dorker/venv/lib/python3.8/site-packages/fastcore/net.py", line 212, in urlsend return urlread(req, return_json=return_json, return_headers=return_headers) File "/Users/dtippett/Desktop/Software_Projects/gh-dorker/venv/lib/python3.8/site-packages/fastcore/net.py", line 113, in urlread if 400 <= e.code < 500: raise ExceptionsHTTP[e.code](e.url, e.hdrs, e.fp) from None fastcore.basics.HTTP422UnprocessableEntityError: HTTP Error 422: Unprocessable Entity

dtaivpp commented 2 years ago

I think part of the issue has to do with the paged function. The client.search.code endpoint returns the following as a body:

{
    "total_count": 0,
    "incomplete_results": False,
    "items": []
}

Paged seems to be picking this up as returning data when really it isn't data. This causes a problem because it iterates through several "pages" when there is no data. That is where GitHub is sending the exception because there wasn't a page one and there certainly isn't a page 8.

I am not sure what the resolution for this would be as I am not very familiar with how the library works relative to githubs.

dtaivpp commented 2 years ago

For anyone that runs into this here is a working pagination function that can replace the current one.

def paginator(operation, per_page=30, page=1, **kwargs):
    """Helper function for paginating requests

    Parameters:
    operation (GHapi Function): The fuction you would like to paginate requests from
    per_page (int): Number of results per page (GitHub may limit some api's to only allow a certain amount)
    page (int): Page to start on
    kwargs: any other arguments you would like to pass to the funtion (eg. q=Query)

    Returns:
    Attribute Dict: A list of dictionary objects containing the results returned
    """
    incomplete = True
    while incomplete:
        result = operation(**kwargs, per_page=per_page, page=page)
        incomplete = result['incomplete_results']
        yield result
        page += 1