JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.51k stars 712 forks source link

Getting KeyError: 'name' #1014

Closed leockl closed 1 year ago

leockl commented 1 year ago

Describe the bug

Getting KeyError: 'name' error.

How to reproduce

import snscrape.modules.twitter as sntwitter
import pandas as pd

keyword = '(Burger OR #Burger)'
max_tweets = 20000

# Creating list to append tweet data to
tweets = []

# Using TwitterSearchScraper to scrape data and append tweets to list
for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + ' since:2022-12-01 until:2022-12-31 lang:"en"').get_items()):
    if i > max_tweets:
        break
    tweets.append([tweet.date, tweet.content, tweet.likeCount, tweet.retweetCount])

# Creating a dataframe to load the list
tweets_df = pd.DataFrame(tweets, columns=["Date Created", "Tweet", "Number of Likes", "Number of Retweets"])

Expected behaviour

Expect the code to work and produce the tweets_df dataframe.

Screenshots and recordings

Not applicable.

Operating system

Windows 11

Python version: output of python3 --version

Python 3.10.9

snscrape version: output of snscrape --version

snscrape 0.7.0.20230622

Scraper

TwitterSearchScraper

How are you using snscrape?

Module (import snscrape.modules.something in Python code)

Backtrace

---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
Cell In[10], line 11
      8 tweets = []
     10 # Using TwitterSearchScraper to scrape data and append tweets to list
---> 11 for i,tweet in enumerate(sntwitter.TwitterSearchScraper(keyword + ' since:2022-12-01 until:2022-12-31 lang:"en"').get_items()):
     12     if i > max_tweets:
     13         break

File [c:\Users\leo_c\anaconda3\lib\site-packages\snscrape\modules\twitter.py:1763](file:///C:/Users/leo_c/anaconda3/lib/site-packages/snscrape/modules/twitter.py:1763), in TwitterSearchScraper.get_items(self)
   1760 params = {'variables': variables, 'features': features}
   1761 paginationParams = {'variables': paginationVariables, 'features': features}
-> 1763 for obj in self._iter_api_data('https://twitter.com/i/api/graphql/7jT5GT59P8IFjgxwqnEdQw/SearchTimeline', _TwitterAPIType.GRAPHQL, params, paginationParams, cursor = self._cursor, instructionsPath = ['data', 'search_by_raw_query', 'search_timeline', 'timeline', 'instructions']):
   1764     yield from self._graphql_timeline_instructions_to_tweets(obj['data']['search_by_raw_query']['search_timeline']['timeline']['instructions'])

File [c:\Users\leo_c\anaconda3\lib\site-packages\snscrape\modules\twitter.py:915](file:///C:/Users/leo_c/anaconda3/lib/site-packages/snscrape/modules/twitter.py:915), in _TwitterAPIScraper._iter_api_data(self, endpoint, apiType, params, paginationParams, cursor, direction, instructionsPath)
    913 while True:
    914     _logger.info(f'Retrieving scroll page {cursor}')
--> 915     obj = self._get_api_data(endpoint, apiType, reqParams, instructionsPath = instructionsPath)
    916     yield obj
    918     # No data format test, just a hard and loud crash if anything's wrong :-)

File [c:\Users\leo_c\anaconda3\lib\site-packages\snscrape\modules\twitter.py:886](file:///C:/Users/leo_c/anaconda3/lib/site-packages/snscrape/modules/twitter.py:886), in _TwitterAPIScraper._get_api_data(self, endpoint, apiType, params, instructionsPath)
    884 if apiType is _TwitterAPIType.GRAPHQL:
    885     params = urllib.parse.urlencode({k: json.dumps(v, separators = (',', ':')) for k, v in params.items()}, quote_via = urllib.parse.quote)
--> 886 r = self._get(endpoint, params = params, headers = self._apiHeaders, responseOkCallback = functools.partial(self._check_api_response, apiType = apiType, instructionsPath = instructionsPath))
    887 return r._snscrapeObj

File [c:\Users\leo_c\anaconda3\lib\site-packages\snscrape\base.py:275](file:///C:/Users/leo_c/anaconda3/lib/site-packages/snscrape/base.py:275), in Scraper._get(self, *args, **kwargs)
    274 def _get(self, *args, **kwargs):
--> 275     return self._request('GET', *args, **kwargs)

File [c:\Users\leo_c\anaconda3\lib\site-packages\snscrape\base.py:246](file:///C:/Users/leo_c/anaconda3/lib/site-packages/snscrape/base.py:246), in Scraper._request(self, method, url, params, data, headers, timeout, responseOkCallback, allowRedirects, proxies)
    244         _logger.debug(f'... ... with response headers: {redirect.headers!r}')
    245 if responseOkCallback is not None:
--> 246     success, msg = responseOkCallback(r)
    247     errors.append(msg)
    248 else:

File [c:\Users\leo_c\anaconda3\lib\site-packages\snscrape\modules\twitter.py:870](file:///C:/Users/leo_c/anaconda3/lib/site-packages/snscrape/modules/twitter.py:870), in _TwitterAPIScraper._check_api_response(self, r, apiType, instructionsPath)
    868 r._snscrapeObj = obj
    869 if apiType is _TwitterAPIType.GRAPHQL and 'errors' in obj:
--> 870     msg = 'Twitter responded with an error: ' + ', '.join(f'{e["name"]}: {e["message"]}' for e in obj['errors'])
    871     instructions = obj
    872     for k in instructionsPath:

File [c:\Users\leo_c\anaconda3\lib\site-packages\snscrape\modules\twitter.py:870](file:///C:/Users/leo_c/anaconda3/lib/site-packages/snscrape/modules/twitter.py:870), in (.0)
    868 r._snscrapeObj = obj
    869 if apiType is _TwitterAPIType.GRAPHQL and 'errors' in obj:
--> 870     msg = 'Twitter responded with an error: ' + ', '.join(f'{e["name"]}: {e["message"]}' for e in obj['errors'])
    871     instructions = obj
    872     for k in instructionsPath:

KeyError: 'name'

Log output

No response

Dump of locals

No response

Additional context

No response

leockl commented 1 year ago

Issue currently being discussed here: https://github.com/JustAnotherArchivist/snscrape/issues/996