TeamHG-Memex / scrapy-rotating-proxies

use multiple proxies with Scrapy
MIT License
733 stars 156 forks source link

Banned responses get to the engine in the end #48

Open 3hhh opened 3 years ago

3hhh commented 3 years ago

If a response is identified as a ban by response_is_ban(self, request, response), it'll currently reach the spider's parse() method after the final retry attempt by your middleware, because you don't raise an exception or otherwise stop the response after more than ROTATING_PROXY_PAGE_RETRY_TIMES banned attempts. This is somewhat inconvenient as it requires the user to call response_is_ban(self, request, response) again in his parse() implementation.

Apart from that I also noticed that ROTATING_PROXY_PAGE_RETRY_TIMES = 1 generally results in 2 retries rather than just 1 (it's always 1 more than ROTATING_PROXY_PAGE_RETRY_TIMES).