Is Twitter's robots.txt respected?

JustAnotherArchivist / snscrape

A social networking service scraper in Python

GNU General Public License v3.0

4.33k stars 699 forks source link

Is Twitter's robots.txt respected? #182

Closed ChristianZX closed 3 years ago

ChristianZX commented 3 years ago

Great tool. But is it respecting Twitters robots.txt?

JustAnotherArchivist commented 3 years ago

snscrape is not a crawler. It is manually invoked and merely emulates you opening a specific search or profile page in a browser, scrolling to the bottom, and extracting the tweets. robots.txt does not apply.

ChristianZX commented 3 years ago

But isn't

opening a specific .. profile page in a browser, scrolling to the bottom, and extracting

pretty much the definition of web crawling? Is it possible to find out which path snScrape is using?

Fusl commented 3 years ago

Since when is opening Twitter in a browser and using a mouse wheel considered crawling now? Am I actually a robot and lied to all those captchas? O_o

JustAnotherArchivist commented 3 years ago

Scraping is not (necessarily) crawling. robots.txt is for systems like search engines that recursively walk through and index a website; it tells them which parts of the site to avoid in such automated crawls. That's not something snscrape does.