JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.51k stars 712 forks source link

Error? sntwitter.TwitterProfileScraper(name) #490

Closed JJery-web closed 2 years ago

JJery-web commented 2 years ago

Thank you again for this project. I currently need to crawl Twitter users' profile, but I have encountered some problems. When I use the function "sntwitter.TwitterProfileScraper(name)", it returns an error.


import snscrape.modules.twitter as sntwitter name="carbonicum" scraper = sntwitter.TwitterProfileScraper(name) print(scraper.entity) for tweet in scraper.get_items(): print(tweet.user.description)


Error retrieving https://api.twitter.com/2/timeline/profile/1619986634.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&send_error_codes=true&simple_quoted_tweets=true&include_tweet_replies=true&userId=1619986634&count=100&ext=mediaStats%2ChighlightedLabel: blocked (429) 4 requests to https://api.twitter.com/2/timeline/profile/1619986634.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&send_error_codes=true&simple_quoted_tweets=true&include_tweet_replies=true&userId=1619986634&count=100&ext=mediaStats%2ChighlightedLabel failed, giving up. Traceback (most recent call last): File "E:\finance Python\0611 social\0612 twitter\test.py", line 9, in for tweet in scraper.get_items(): File "D:\Anaconda3\lib\site-packages\snscrape\modules\twitter.py", line 813, in get_items for obj in self._iter_api_data(f'https://api.twitter.com/2/timeline/profile/{userId}.json', params, paginationParams): File "D:\Anaconda3\lib\site-packages\snscrape\modules\twitter.py", line 369, in _iter_api_data obj = self._get_api_data(endpoint, reqParams) File "D:\Anaconda3\lib\site-packages\snscrape\modules\twitter.py", line 339, in _get_api_data r = self._get(endpoint, params = params, headers = self._apiHeaders, responseOkCallback = self._check_api_response) File "D:\Anaconda3\lib\site-packages\snscrape\base.py", line 216, in _get return self._request('GET', *args, **kwargs) File "D:\Anaconda3\lib\site-packages\snscrape\base.py", line 212, in _request raise ScraperException(msg) snscrape.base.ScraperException: 4 requests to https://api.twitter.com/2/timeline/profile/1619986634.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&send_error_codes=true&simple_quoted_tweets=true&include_tweet_replies=true&userId=1619986634&count=100&ext=mediaStats%2ChighlightedLabel failed, giving up.

JJery-web commented 2 years ago

How should I solve it, thank you.

JJery-web commented 2 years ago

Based on the error, it seems that the api url is wrong.

TheTechRobo commented 2 years ago

Duplicate of #367. Try the latest developer version. Make sure you're on Python 3.8+ if you're not already; pip will automatically downgrade the package without telling you if you're on 3.7!

JJery-web commented 2 years ago

Duplicate of #367. Try the latest developer version. Make sure you're on Python 3.8+ if you're not already; pip will automatically downgrade the package without telling you if you're on 3.7!

After changing to the developer version, I succeeded. Thank you very much for your reply. I have another question. I found that some users' twitter can not be crawled completely. For example, the result of the following code is 0. But by definition https://twitter.com/DrasticMeasure5 There are twitter content. I don't know why I can't crawl to the content.

import snscrape. modules. twitter
scraper = snscrape. modules. twitter. TwitterSearchScraper('(from:DrasticMeasure5) until:2015-8-13 since:2014-7-21')
print(sum(1 for tweet in scraper.get_items()))

I find this problem similar to: https://github.com/JustAnotherArchivist/snscrape/issues/468

JustAnotherArchivist commented 2 years ago

Either that or #4. Nothing snscrape can do about it anyway.