JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.31k stars 698 forks source link

twitter: KeyError: 'content-type' #21

Closed ivan closed 5 years ago

ivan commented 5 years ago

This happens (infrequently) when scraping a twitter-user; it looks spurious and not related to a specific user:

Traceback (most recent call last):
  File "/home/grab/sns-venv/bin/snscrape", line 11, in <module>
    load_entry_point('snscrape==0.1.3', 'console_scripts', 'snscrape')()
  File "/home/grab/sns-venv/lib/python3.7/site-packages/snscrape/cli.py", line 59, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "/home/grab/sns-venv/lib/python3.7/site-packages/snscrape/modules/twitter.py", line 63, in get_items
    responseOkCallback = self._check_json_callback)
  File "/home/grab/sns-venv/lib/python3.7/site-packages/snscrape/base.py", line 99, in _get
    return self._request('GET', *args, **kwargs)
  File "/home/grab/sns-venv/lib/python3.7/site-packages/snscrape/base.py", line 72, in _request
    success, msg = responseOkCallback(r)
  File "/home/grab/sns-venv/lib/python3.7/site-packages/snscrape/modules/twitter.py", line 29, in _check_json_callback
    if r.headers['content-type'] != 'application/json;charset=utf-8':
  File "/home/grab/sns-venv/lib/python3.7/site-packages/requests/structures.py", line 52, in __getitem__
    return self._store[key.lower()][1]
KeyError: 'content-type'
JustAnotherArchivist commented 5 years ago

Sounds like Twitter sometimes doesn't send a Content-Type header. So I guess the solution is to treat any such response as invalid.

I do wonder though what Twitter does send in those cases. It might be a good idea to dump the raw data into the log or to a temporary file in case of an uncaught exception.