aivarsk / scrapy-proxies

Random proxy middleware for Scrapy
MIT License
1.65k stars 409 forks source link

Retry won't pick a new proxy. #15

Open HGYD opened 8 years ago

HGYD commented 8 years ago

Hi, I use a proxies list to run my spider. However, it failed to pick a new porxy when the connection failure happens.

2016-09-20 17:48:25 [scrapy] DEBUG: Using proxy http://xxx.160.162.95:8080, 3 proxies left 2016-09-20 17:48:27 [scrapy] INFO: Removing failed proxy http://xxx.160.162.95:8080, 2 proxies left 2016-09-20 17:48:27 [scrapy] DEBUG: Retrying <GET http://jsonip.com/> (failed 1 times): User timeout caused connection failure: Getting http://jsonip.com/ took longer than 2.0 seconds.. 2016-09-20 17:48:29 [scrapy] INFO: Removing failed proxy http://xxx.160.162.95:8080, 2 proxies left 2016-09-20 17:48:29 [scrapy] DEBUG: Retrying <GET http://jsonip.com/> (failed 2 times): User timeout caused connection failure: Getting http://jsonip.com/ took longer than 2.0 seconds.. 2016-09-20 17:48:31 [scrapy] INFO: Removing failed proxy http://xxx.160.162.95:8080, 2 proxies left 2016-09-20 17:48:31 [scrapy] DEBUG: Gave up retrying <GET http://jsonip.com/> (failed 3 times): User timeout caused connection failure: Getting http://jsonip.com/ took longer than 2.0 seconds..

Please help to fix this problem. thanks a lot

HGYD commented 8 years ago

the problem may cause by this code if 'proxy' in request.meta: return I deleted the code and fixed the problem.

I think when you retry, you already have a proxy in your request.meta, so the middleware just pass away.

watermelonjuice commented 8 years ago

Same issue

astwyg commented 8 years ago

same +1

watermelonjuice commented 7 years ago

@aivarsk any chance you can update us on this?

IvanIrk commented 7 years ago

Same issue

IvanIrk commented 7 years ago

HGYD solution works for me.

flash5 commented 7 years ago

I have similar issue of selecting new proxy like mentioned above, with my code after finishing retry attempts the code executions stops. Please suggest some solution

Here is the back trace

2017-07-06 18:54:45 [scrapy.extensions.telnet] DEBUG: Telnet console listening on 127.0.0.1:6042 2017-07-06 18:54:45 [scrapy.core.engine] INFO: Spider opened 2017-07-06 18:54:45 [scrapy.proxies] DEBUG: Using proxy http://72.169.78.1:87, 200 proxies left 2017-07-06 18:54:59 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 1 times): 403 Forbidden 2017-07-06 18:55:09 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 2 times): 403 Forbidden 2017-07-06 18:55:17 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left 2017-07-06 18:55:17 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 3 times): [] 2017-07-06 18:55:24 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left 2017-07-06 18:55:24 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 4 times): [] 2017-07-06 18:55:31 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left 2017-07-06 18:55:31 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 5 times): [] 2017-07-06 18:55:45 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 6 times): 403 Forbidden 2017-07-06 18:55:53 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left 2017-07-06 18:55:53 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 7 times): [] 2017-07-06 18:56:01 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 8 times): 403 Forbidden 2017-07-06 18:56:19 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 9 times): 403 Forbidden 2017-07-06 18:56:33 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left 2017-07-06 18:56:33 [scrapy.downloadermiddlewares.retry] DEBUG: Retrying <GET http://xyz.com> (failed 10 times): [] 2017-07-06 18:56:41 [scrapy.proxies] INFO: Removing failed proxy http://72.169.78.1:87, 199 proxies left 2017-07-06 18:56:41 [scrapy.downloadermiddlewares.retry] DEBUG: Gave up retrying <GET http://xyz.com> (failed 11 times): [] Traceback (most recent call last): File "/usr/bin/scrapy", line 11, in sys.exit(execute()) File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 149, in execute _run_print_help(parser, _run_command, cmd, args, opts) File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 89, in _run_print_help func(*a, **kw) File "/usr/lib/python2.7/site-packages/scrapy/cmdline.py", line 156, in _run_command cmd.run(args, opts) File "/usr/lib/python2.7/site-packages/scrapy/commands/shell.py", line 73, in run shell.start(url=url, redirect=not opts.no_redirect) File "/usr/lib/python2.7/site-packages/scrapy/shell.py", line 48, in start self.fetch(url, spider, redirect=redirect) File "/usr/lib/python2.7/site-packages/scrapy/shell.py", line 115, in fetch reactor, self._schedule, request, spider) File "/usr/lib64/python2.7/site-packages/twisted/internet/threads.py", line 122, in blockingCallFromThread result.raiseException() File "", line 2, in raiseException twisted.web._newclient.ResponseNeverReceived: []

shadow-ru commented 7 years ago

Use errbacks in Requests:

def start_requests(self):
    # ...
    yield scrapy.Request(url=url, callback=self.parse, errback=self.make_new_request)

def make_new_request(self, failure):
    return scrapy.Request(url=failure.request.url, callback=self.parse, errback=self.make_new_request, dont_filter=True)
wvengen commented 4 years ago

What about setting a new proxy if a retry has happened? On line 81:

# Don't overwrite with a random one (server-side state for IP)
# But when randomizing every request, we do want to update the proxy on retry.
if not (self.mode == Mode.RANDOMIZE_PROXY_EVERY_REQUESTS and request.meta.get('retry_times', 0) == 0):
  if 'proxy' in request.meta:
    if request.meta["exception"] is False:
      return