MichaelCurrin / twitterverse

Store and report on Twitter conversations, from tweets to trending topics 🌍 🐦 🐍
https://michaelcurrin.github.io/twitterverse/
MIT License
13 stars 1 forks source link

Make fetchTweetsPaging more robust #135

Open MichaelCurrin opened 4 years ago

MichaelCurrin commented 4 years ago

Note on enumerate(cursor) line used in this fetchTweetsPaging.

This is prone to occasional errors which blocks the current next page, unless one has a way to retry the page. If there entire script is retried this can recover, but if there is a bad data issue on Twitter or Tweepy handling then it would take low level looking at Tweepy to debug and fix this (requiring a monkey match on Tweepy).

Sample:

tweepy.error.TweepError: Failed to parse JSON payload: Unterminated string starting at: line 1 column 913686 (char 913685)

site-packages/tweepy/parsers.py", line 91, in parse json = JSONParser.parse(self, method, payload)

MichaelCurrin commented 4 years ago

Detailed log

    [START] searchStoreAndLabel 
    Starting Search. Expected pages: 100,000. Expected tweets: 10,000,000.
    Stored so far: 100
    Stored so far: 200
    Stored so far: 300
    Stored so far: 400
    Stored so far: 500
    Stored so far: 600
    Stored so far: 700
    Stored so far: 800
    Stored so far: 900
    Traceback (most recent call last):
      File "/Users/mcurrin/.local/virtualenvs/twitterverse-alt/lib/python3.7/site-packages/tweepy/parsers.py", line 48, in parse
        json = json_lib.loads(payload)
      File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/__init__.py", line 348, in loads
        return _default_decoder.decode(s)
      File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 337, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/usr/local/Cellar/python/3.7.6_1/Frameworks/Python.framework/Versions/3.7/lib/python3.7/json/decoder.py", line 353, in raw_decode
        obj, end = self.scan_once(s, idx)
    json.decoder.JSONDecodeError: Unterminated string starting at: line 1 column 913686 (char 913685)

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "app/utils/insert/search_and_store_tweets.py", line 343, in <module>
        main()
      File "app/utils/insert/search_and_store_tweets.py", line 339, in main
        run(args.pages, args.persist, args.campaign, args.query)
      File "app/utils/insert/search_and_store_tweets.py", line 239, in run
        utilityCampaignRec, customCampaignRec,
      File "/Users/mcurrin/repos/twitterverse-alt/app/lib/__init__.py", line 40, in timed
        result = func(*args, **kw)
      File "app/utils/insert/search_and_store_tweets.py", line 193, in searchStoreAndLabel
        profileRecs, tweetRecs = storeTweets(fetchedTweets, persist)
      File "app/utils/insert/search_and_store_tweets.py", line 105, in storeTweets
        for fetchedTweet in fetchedTweets:
      File "app/utils/insert/search_and_store_tweets.py", line 76, in search
        for page in pages:
      File "/Users/mcurrin/repos/twitterverse-alt/app/lib/twitter_api/search.py", line 156, in fetchTweetsPaging
        for i, page in enumerate(cursor):
      File "/Users/mcurrin/.local/virtualenvs/twitterverse-alt/lib/python3.7/site-packages/tweepy/cursor.py", line 47, in __next__
        return self.next()
      File "/Users/mcurrin/.local/virtualenvs/twitterverse-alt/lib/python3.7/site-packages/tweepy/cursor.py", line 115, in next
        model = ModelParser().parse(self.method(create=True), data)
      File "/Users/mcurrin/.local/virtualenvs/twitterverse-alt/lib/python3.7/site-packages/tweepy/parsers.py", line 91, in parse
        json = JSONParser.parse(self, method, payload)
      File "/Users/mcurrin/.local/virtualenvs/twitterverse-alt/lib/python3.7/site-packages/tweepy/parsers.py", line 50, in parse
        raise TweepError('Failed to parse JSON payload: %s' % e)
    tweepy.error.TweepError: Failed to parse JSON payload: Unterminated string starting at: line 1 column 913686 (char 913685)