JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.47k stars 708 forks source link

Likelihood of rate limits/IP ban from Twitter? #385

Open Meorge opened 2 years ago

Meorge commented 2 years ago

I'm currently trying to use snscrape to download Tweets from Twitter. According to my calculations, I should be getting around 2,200,000 Tweets in total by the time it finishes. I'm concerned about the possibility of getting IP banned from Twitter as a result of this. Is this something worth being concerned about, or should I not worry?

More generally:

This tool seems like a godsend, compared to the limits of the official Twitter API. Having a more solid understanding of the "safe zone" would make me feel more comfortable with using it. I know that maintainers can't guarantee anything about rate limits or IP bans, but if anyone has experience with where they begin to set in, knowing that would help a lot!

TheTechRobo commented 2 years ago

Well, recently JustAnotherArchivist has added an amazing feature. snscrape now reuses guest tokens across sessions. This prevents rate-limiting from burning through too many guest tokens.

Personally I have not been banned, and I've been downloading thousands of tweets recursively...

I recommend creating an "Ignore File" and writing the tweet data for every 5 tweets or so, so that you don't have to redo all your progress if/when (2.2mil tweets is a lot) you get banned. I did something similar in my recursive tweet downloading script.

JustAnotherArchivist commented 2 years ago

Based on my past experience, that should be fine. I've scraped many millions of tweets before in parallel without problems. I usually split such big runs up into monthly scrapes using a search query like keyword since:2022-01-01 until:2022-02-01 to fetch tweets from this January. Then I iterate over the months and finally check whether each monthly output file contains the expected results (e.g. whether the last result is close to midnight on the 1st).

See also #307

Meorge commented 2 years ago

Thank you for the info! Perhaps I'm in the minority on this, but it might be helpful to others to include this sort of anecdotal information somewhere, so that people have a better idea on how much they can expect to use snscrape before being in danger of getting banned/rate limited? (Apologies if it's already available somewhere and I just didn't see it!)

JustAnotherArchivist commented 2 years ago

It's mentioned in some issues but not prominently. Documentation is WIP, and I agree it may be worth including some vague notes about it there.

cosmicoptima commented 2 years ago

I have been rate-limited by IP with a cooldown of half an hour to an hour; AFAICT it is not possible to get banned.

TheTechRobo commented 2 years ago

I've never been rate-limited before, at least that I've noticed.

cosmicoptima commented 2 years ago

my memory is hazy but it took... maybe 10-100 concurrent threads

TheTechRobo commented 2 years ago

Ok then that explains it, lol. I only have a few at a time

nandanVasistaBH29 commented 1 year ago

Error retrieving https://api.twitter.com/2/search/adaptive.json?include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&include_ext_is_blue_verified=1&include_ext_verified_type=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_ext_limited_action_results=false&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_ext_collab_control=true&include_ext_views=true&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=from%3A____lifestyle&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&include_ext_edit_control=true&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2Cenrichments%2CsuperFollowMetadata%2CunmentionInfo%2CeditControl%2Ccollab_control%2Cvibe: blocked (403)

4 requests to https://api.twitter.com/2/search/adaptive.json? include_profile_interstitial_type=1&include_blocking=1&include_blocked_by=1&include_followed_by=1&include_want_retweets=1&include_mute_edge=1&include_can_dm=1&include_can_media_tag=1&include_ext_has_nft_avatar=1&include_ext_is_blue_verified=1&include_ext_verified_type=1&skip_status=1&cards_platform=Web-12&include_cards=1&include_ext_alt_text=true&include_ext_limited_action_results=false&include_quote_count=true&include_reply_count=1&tweet_mode=extended&include_ext_collab_control=true&include_ext_views=true&include_entities=true&include_user_entities=true&include_ext_media_color=true&include_ext_media_availability=true&include_ext_sensitive_media_warning=true&include_ext_trusted_friends_metadata=true&send_error_codes=true&simple_quoted_tweet=true&q=from%3A____lifestyle&tweet_search_mode=live&count=20&query_source=spelling_expansion_revert_click&pc=1&spelling_corrections=1&include_ext_edit_control=true&ext=mediaStats%2ChighlightedLabel%2ChasNftAvatar%2CvoiceInfo%2Cenrichments%2CsuperFollowMetadata%2CunmentionInfo%2CeditControl%2Ccollab_control%2Cvibe failed, giving up. Errors: blocked (403), blocked (403), blocked (403), blocked (403)

snscrape was working fine suddenly I am getting these errors can i know why and how to fix it

Thanks

hyzhak commented 1 year ago

At a high level, we are working to prevent these accounts from 1) scraping people’s public Twitter data to build AI models and 2) manipulating people and conversation on the platform in various ways.

https://business.twitter.com/en/blog/update-on-twitters-limited-usage.html

so even if it wasn't problem before it could happen problem now