Alhajras / webscraper

Configurable search engine written in Python and Angular. It supports indexing as well.
1 stars 0 forks source link

Avoid getting block from a server #1

Open Alhajras opened 1 year ago

Alhajras commented 1 year ago

Different solutions to test :

Random pause between each requests Make good use of sessions:

1) Keep the same session for an amount of request (30 to 60)

2) Clear your cookies after 30 to 60 request and change the user agent. Use this simple python framework: https://pypi.org/project/shadow-useragent/

3) If that still does not work: rotate your IP over time (every 30 to 60 requests for instance) thanks to a proxy provider, rotate your user-agent, clear your cookies at the same time.

You should now look random for most of the websites. If you see any more bot mitigation (recaptchas) or specialized anti-scraping services, this could get trickier.

https://stackoverflow.com/questions/59408534/blocked-from-scraping-a-website-with-scrapy