DOWNLOAD_DELAY will not work if use proxy.

TeamHG-Memex / scrapy-rotating-proxies

use multiple proxies with Scrapy

MIT License

738 stars 158 forks source link

DOWNLOAD_DELAY will not work if use proxy. #23

Open dodoflyy opened 5 years ago

dodoflyy commented 5 years ago

Hello, It seems scrapy spider's DOWNLOAD_DELAY will not work if I use this proxy. In my script I set DOWNLOAD_DELAY=8 and enable random DOWNLOAD_DELAY。

custom_settings = {
        "RETRY_TIMES": 7,
        "DOWNLOAD_DELAY": 8,
        "RANDOMIZE_DOWNLOAD_DELAY": True,
        "ROBOTSTXT_OBEY": False
    }

But the scrapy runs too fast.

INFO: Crawled 40 pages (at 40 pages/min), scraped 0 items (at 0 items/min)

kmike commented 5 years ago

This is by design - see https://github.com/TeamHG-Memex/scrapy-rotating-proxies#concurrency and https://github.com/TeamHG-Memex/scrapy-rotating-proxies/blob/dfece2d9514d6a24c134585414764ad3656d096c/rotating_proxies/middlewares.py#L148

So DOWNLOAD_DELAY works, but in a different way. I think it'd be a good feature to have to allow disabling that; pull requests are welcome.