JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.51k stars 712 forks source link

`KeyError: 'timeline'` crash on `twitter-profile` scraper #991

Open frankiec opened 1 year ago

frankiec commented 1 year ago

Describe the bug

when I run:

/bin/snscrape --jsonl --max-results 10 twitter-profile someone

it throws error:

Traceback (most recent call last):
  File "/home/ubuntu/.local/bin/snscrape", line 8, in <module>
    sys.exit(main())
  File "/home/ubuntu/.local/lib/python3.10/site-packages/snscrape/_cli.py", line 323, in main
    for i, item in enumerate(scraper.get_items(), start = 1):
  File "/home/ubuntu/.local/lib/python3.10/site-packages/snscrape/modules/twitter.py", line 1830, in get_items
    instructions = obj['data']['user']['result']['timeline_v2']['timeline']['instructions']
KeyError: 'timeline'

How to reproduce

/bin/snscrape --jsonl --max-results 10 twitter-profile someone

Expected behaviour

It was working until Thu Jun 29 21:35:07 UTC 2023.

Screenshots and recordings

No response

Operating system

ubuntu 16

Python version: output of python3 --version

Python 3.10.6

snscrape version: output of snscrape --version

snscrape 0.6.2.20230321.dev32+gb76f485

Scraper

witter-profile

How are you using snscrape?

CLI (snscrape ... as a command, e.g. in a terminal)

Backtrace

No response

Log output

No response

Dump of locals

No response

Additional context

No response

0bmay commented 1 year ago

I have this error too

File "/home/.venv/lib/python3.10/site-packages/snscrape/modules/twitter.py", line 1902, in get_items instructions = obj['data']['user']['result']['timeline_v2']['timeline']['instructions'] KeyError: 'timeline'

version: v0.7.0.20230622

JustAnotherArchivist commented 1 year ago

I'm seeing the same thing on the web interface: the 'Replies' tab on profile pages is empty. This looks like a bug on Twitter's side.

0bmay commented 1 year ago

looks like the UserTweets api call works and has data returned, but UserTweetsAndReplies is busted on Twitters end.. call returns 200, but has no data in timeline_v2

locfinessemonster commented 1 year ago

I am experiencing the timeline key error as well. @0bmay would you mind sharing how to implement the UserTweets api call, I am having difficulty figuring that out.

0bmay commented 1 year ago

in TwitterProfileScrapper I added a second get_items call, _getitems2, and I use that to get the profile tweets.. no replies, but something is better than nothing. Most of the code is the same as get_items, I just changed the features and variables and added the field_options that the calls use on the frontend of the site.

    def get_items2(self):
        if not self._isUserId:
            if self.entity is None:
                raise snscrape.base.ScraperException(f'Could not resolve username {self._user!r} to ID')
            userId = self.entity.id
        else:
            userId = self._user

        paginationVariables = {
            'userId': userId,
            'count': 100,
            'cursor': None,
            'includePromotedContent': True,
            'withQuickPromoteEligibilityTweetFields': True,
            'withVoice': True,
            'withV2Timeline': True,
        }
        variables = paginationVariables.copy()
        del variables['cursor']
        features = {
            'rweb_lists_timeline_redesign_enabled': False,
            'responsive_web_graphql_exclude_directive_enabled': True,
            'verified_phone_label_enabled': False,
            'creator_subscriptions_tweet_preview_api_enabled': False,
            'responsive_web_graphql_timeline_navigation_enabled': True,
            'responsive_web_graphql_skip_user_profile_image_extensions_enabled': False,
            'tweetypie_unmention_optimization_enabled': True,
            'responsive_web_edit_tweet_api_enabled': True,
            'graphql_is_translatable_rweb_tweet_is_translatable_enabled': True,
            'view_counts_everywhere_api_enabled': True,
            'longform_notetweets_consumption_enabled': True,
            'responsive_web_twitter_article_tweet_consumption_enabled': False,
            'tweet_awards_web_tipping_enabled': False,
            'freedom_of_speech_not_reach_fetch_enabled': True,
            'standardized_nudges_misinfo': True,
            'tweet_with_visibility_results_prefer_gql_limited_actions_policy_enabled': False,
            'longform_notetweets_rich_text_read_enabled': True,
            'longform_notetweets_inline_media_enabled': False,
            'responsive_web_enhance_cards_enabled': False
        }
        field_toggles = {"withArticleRichContentState": False}
        params = {'variables': variables, 'features': features}
        paginationParams = {'variables': paginationVariables, 'features': features, 'fieldToggles': field_toggles}

        gotPinned = False
        previousPagesTweetIds = set()
        for obj in self._iter_api_data('https://twitter.com/i/api/graphql/sPOiMsDrOtmxC00E01DkTA/UserTweets', _TwitterAPIType.GRAPHQL, params, paginationParams, instructionsPath = ['data', 'user', 'result', 'timeline_v2', 'timeline', 'instructions']):
            if not obj['data'] or 'result' not in obj['data']['user']:
                raise snscrape.base.ScraperException('Empty response')
            if obj['data']['user']['result']['__typename'] == 'UserUnavailable':
                raise snscrape.base.EntityUnavailable('User unavailable')
            instructions = obj['data']['user']['result']['timeline_v2']['timeline']['instructions']
            if not gotPinned:
                for instruction in instructions:
                    if instruction['type'] == 'TimelinePinEntry':
                        gotPinned = True
                        tweetId = int(instruction['entry']['entryId'][6:]) if instruction['entry']['entryId'].startswith('tweet-') else None
                        yield self._graphql_timeline_tweet_item_result_to_tweet(instruction['entry']['content']['itemContent']['tweet_results']['result'], tweetId = tweetId, pinned = True)
            tweets = list(self._graphql_timeline_instructions_to_tweets(instructions, pinned = False))
            pageTweetIds = frozenset(tweet.id for tweet in tweets)
            if len(pageTweetIds) > 0 and pageTweetIds in previousPagesTweetIds:
                _logger.warning("Found duplicate page of tweets, stopping as assumed cycle found in Twitter's pagination")
                break
            previousPagesTweetIds.add(pageTweetIds)
            # Includes tweets by other users on conversations, don't return those
            for tweet in tweets:
                if getattr(getattr(tweet, 'user', None), 'id', userId) != userId:
                    continue
                yield tweet
Pratham-19 commented 1 year ago

Getting the same err

locfinessemonster commented 1 year ago

@0bmay thank you for sharing I really appreciate it

jerrycool123 commented 1 year ago

Twitter has blocked every unregistered user from viewing tweets. Is this related?

0bmay commented 1 year ago

Twitter has blocked every unregistered user from viewing tweets. Is this related?

not related.. the UserTweets api endpoint is still returning data... UserTweetsAndReplies is still b0rk3d.

zack-sims413 commented 1 year ago

Assuming there is still an issue with UserTweetsAndReplies API endpoint? I'm still getting KeyError: 'timeline'

Akhorramrouz commented 1 year ago

@JustAnotherArchivist is there any update implemented the proposed solution to fix the error at least for the tweets?