JustAnotherArchivist / snscrape

A social networking service scraper in Python
GNU General Public License v3.0
4.31k stars 698 forks source link

Auth and rate limits with GraphQL API #984

Closed zozoheir closed 1 year ago

zozoheir commented 1 year ago

First of all thanks a lot for the new updates, it's looking great!

I've been looking through the code but couldn't figure out how the library manages to search without user auth. My understanding is the graphQL api still requires user auth and thus using requests reasonably. With the new updates I'd like to please understand how api limits work now and what kind of access does the library have to search user tweets etc. On twitter.com I can't search anything without being authenticated so it must be using some kind of auth? Thanks in advance!

JustAnotherArchivist commented 1 year ago

No auth involved. snscrape uses the unofficial API as it's used by the website. It's possible to get banned, but you have to try quite hard to make that happen. Even a few parallel scrapes seem to be fine.

385 #551 etc. are still accurate as far as I'm aware.

zozoheir commented 1 year ago

Thanks for your response! This library https://github.com/vladkens/twscrape actually uses snscrape for querying the twitter graphQL API, but it actually does require account and has a whole module on managing multiple concurrent accounts. Do you have an explanation why Nitter and snscrape don't require auth and are able to do search whereas Twitter requires auth to do search on the web app, and maybe why twscrape also requires auth to use snscrape?

How come snscrape doesn't require any of that? I'm not complaining actually the opposite no auth sounds great so just trying to understand the trick

TheTechRobo commented 1 year ago

I don't think it uses snscrape, it just uses a compatible API so your JSONL parsers or snscrape-based scripts can still work: https://github.com/search?q=repo%3Avladkens%2Ftwscrape%20snscrape&type=code https://github.com/vladkens/twscrape/blob/815189e27726d1a117cd576f28a156293248580d/twscrape/models.py

JustAnotherArchivist commented 1 year ago

One of the search endpoints on Twitter's unofficial API does not require auth. That's it. It wasn't me who found it, but it's not exactly hidden either if you know where to look.