fastai / ghapi

A delightful and complete interface to GitHub's amazing API
https://ghapi.fast.ai/
Apache License 2.0
609 stars 62 forks source link

UnicodeDecodeError on repos.download_tarball_archive #157

Closed ddobrinskiy closed 1 month ago

ddobrinskiy commented 1 year ago

Same error on repos.download_tarball_archive and repos.download_tarball_archive

Possibly related to https://github.com/fastai/ghapi/issues/22

packages installed

ghapi                    1.0.3
fastcore                 1.5.28

steps to reproduce

from ghapi.core import GhApi
GhApi(gh_host='https://api.github.com', authenticate=False).repos.download_tarball_archive('fastai', 'nbdev-template', 'main')

full logs

---------------------------------------------------------------------------
UnicodeDecodeError                        Traceback (most recent call last)
Cell In[21], line 3
      1 gh_repos = GhApi(gh_host='https://api.github.com/', authenticate=False).repos
----> 3 gh_repos.download_tarball_archive('fastai', 'nbdev-template', 'master')
      4 # gh_repos.download_zipball_archive('fastai', 'nbdev', 'master')

File [~/.local/share/virtualenvs/nbdev-mvrfY8_9/lib/python3.10/site-packages/ghapi/core.py:61](https://file+.vscode-resource.vscode-cdn.net/Users/david/proj/nbdev/nbs/api/~/.local/share/virtualenvs/nbdev-mvrfY8_9/lib/python3.10/site-packages/ghapi/core.py:61), in _GhVerb.__call__(self, headers, *args, **kwargs)
     58 kwargs = {k:v for k,v in kwargs.items() if v is not None}
     59 route_p,query_p,data_p = [{p:kwargs[p] for p in o if p in kwargs}
     60                          for o in (self.route_ps,self.params,d)]
---> 61 return self.client(self.path, self.verb, headers=headers, route=route_p, query=query_p, data=data_p)

File [~/.local/share/virtualenvs/nbdev-mvrfY8_9/lib/python3.10/site-packages/ghapi/core.py:120](https://file+.vscode-resource.vscode-cdn.net/Users/david/proj/nbdev/nbs/api/~/.local/share/virtualenvs/nbdev-mvrfY8_9/lib/python3.10/site-packages/ghapi/core.py:120), in GhApi.__call__(self, path, verb, headers, route, query, data)
    118 return_json = ('json' in headers['Accept'])
    119 debug = self.debug if self.debug else print_summary if os.getenv('GHAPI_DEBUG') else None
--> 120 res,self.recv_hdrs = urlsend(path, verb, headers=headers or None, debug=debug, return_headers=True,
    121                              route=route or None, query=query or None, data=data or None, return_json=return_json)
    122 if 'X-RateLimit-Remaining' in self.recv_hdrs:
    123     newlim = self.recv_hdrs['X-RateLimit-Remaining']

File [~/.local/share/virtualenvs/nbdev-mvrfY8_9/lib/python3.10/site-packages/fastcore/net.py:218](https://file+.vscode-resource.vscode-cdn.net/Users/david/proj/nbdev/nbs/api/~/.local/share/virtualenvs/nbdev-mvrfY8_9/lib/python3.10/site-packages/fastcore/net.py:218), in urlsend(url, verb, headers, route, query, data, json_data, return_json, return_headers, debug)
    215 if route and route.get('archive_format', None):
    216     return urlread(req, decode=False, return_json=False, return_headers=return_headers)
--> 218 return urlread(req, return_json=return_json, return_headers=return_headers)

File [~/.local/share/virtualenvs/nbdev-mvrfY8_9/lib/python3.10/site-packages/fastcore/net.py:122](https://file+.vscode-resource.vscode-cdn.net/Users/david/proj/nbdev/nbs/api/~/.local/share/virtualenvs/nbdev-mvrfY8_9/lib/python3.10/site-packages/fastcore/net.py:122), in urlread(url, data, headers, decode, return_json, return_headers, timeout, **kwargs)
    119     if 400 <= e.code < 500: raise ExceptionsHTTP[e.code](e.url, e.hdrs, e.fp, msg=e.msg) from None
    120     else: raise
--> 122 if decode: res = res.decode()
    123 if return_json: res = loads(res)
    124 return (res,dict(hdrs)) if return_headers else res
fy commented 9 months ago

fastcore/net.py will only bypass JSON parsing after retrieving the response when archive_format is set in route_ps.

So to trick ghapi/core.py into adding archive_format to the routes, one can patch the method:

api.repos.download_tarball_archive.route_ps.append("archive_format")

then call with any value

api.repos.download_tarball_archive(..., archive_format="1")

Until there's an official fix, this works for me.

radam9 commented 1 month ago

@jph00 I would like to tackle this issue and solve it for all related github endpoints. To be able to fix this problem we need to be able to conditionally pass decode=False to urlsend here when we expect a non-json response (zip, tar ...etc) from an endpoint.

I understand that the metadata.py is generated using the github openapi spec, and contains all the endpoints. I have checked the github openapi spec, but could not find anything that will help to automatically identify that an endpoint will return a non-json payload.

Do you have any suggestions of solving this other than hard coding the endpoints that will return a non-json payload and setting the decode=False argument when calling urlsend ??

jph00 commented 1 month ago

Thanks for checking. No I'm not aware of any special approach. Just making a list of the route names seems fine, except that it's inconvenient to actually create that list.

Message ID: @.***>