TeamHG-Memex / scrapy-rotating-proxies

use multiple proxies with Scrapy
MIT License
738 stars 158 forks source link

This middleware breaks default throttling #50

Open 3hhh opened 4 years ago

3hhh commented 4 years ago

You set a per-proxy download_slot at [1].

Essentially that means: Throttling works per-proxy (cf. [2], same for 'DOWNLOAD_DELAY` et al) and not per destination host anymore (that would be the default).

Since most users will have >> 100 proxies, you'll hammer the target host with >> 100 requests at once. So the user can be nice to his proxy provider, but not to the destination host.

[1] https://github.com/TeamHG-Memex/scrapy-rotating-proxies/blob/master/rotating_proxies/middlewares.py#L146 [2] https://github.com/scrapy/scrapy/blob/master/scrapy/extensions/throttle.py