NikolaiT / se-scraper

Javascript scraping module based on puppeteer for many different search engines...
https://scrapeulous.com/
Apache License 2.0
538 stars 123 forks source link

Safe per IP Google limits? #19

Open sf-steve opened 5 years ago

sf-steve commented 5 years ago

Hi all, does anyone have up-to-date data for how many searches you can perform per IP per time period before getting blocked?

We have a limited scraping need, and spare server resources, so figured this would be a good solution, but would like to know how best to split the work.

I found old posts suggesting around 300 regular (not Google dorks) searches per 24 hours, but have no idea if this is still correct.

Any input greatly appreciated.

YvesBos commented 5 years ago

I have been able to scrape for +12 hours with a 70-200 second sleep in between each query on a single IP, without detection. When I reduced the sleep range to 60-110 seconds, it was detected after ~15 minutes. Patience is key :)

sf-steve commented 5 years ago

Thanks @YvesBos, so thats inline with the older blog posts i found. I guess we can set up a couple of VPS's and run some tests. Have you tried any tests with continuous sessions (same userdatadir) vs fresh sessions?

YvesBos commented 5 years ago

Ah yes, I forgot to mention that. I only scrape for a couple of keywords and then relaunch the scraper with a different country in the config, thus creating a new session. I have no experience with using continuous sessions.