I implemented a ban_policy to mark redirect 302 as a "ban".
But once the request reached the maximum retries it is let through and therefor picked-up by scrapy.downloadermiddlewares.redirect
Which in turn restart a max_proxies_to_try cycle the redirected request (a useless captacha page.)
2020-10-02 05:31:07 [rotating_proxies.middlewares] DEBUG: Gave up retrying <GET http://www.url.com> (failed 6 times with different proxies)
2020-10-02 05:31:07 [scrapy.downloadermiddlewares.redirect] DEBUG: Redirecting (302) to <GET http://www.url.com/redirected/to/captacha> from <GET http://www.url.com>
2020-10-02 05:31:10 [rotating_proxies.middlewares] DEBUG: Gave up retrying <GET http://www.url.com/redirected/to/captacha> (failed 6 times with different proxies)
I implemented a ban_policy to mark redirect 302 as a "ban".
But once the request reached the maximum retries it is let through and therefor picked-up by scrapy.downloadermiddlewares.redirect
Which in turn restart a max_proxies_to_try cycle the redirected request (a useless captacha page.)
Shouldn't we add a
raise IgnoreRequest()
like so: