fabienvauchelles / scrapoxy

Scrapoxy is a super proxy aggregator, allowing you to manage all proxies in one place 🎯, rather than spreading it across multiple scrapers πŸ•ΈοΈ. It also smartly handles traffic routing πŸ”€ to minimize bans and increase success rates πŸš€.
http://scrapoxy.io
MIT License
2.04k stars 235 forks source link

Error: request error from target (connect ETIMEDOUT) #36

Closed nvg58 closed 7 years ago

nvg58 commented 7 years ago

Hi @fabienvauchelles,

I got the error below when using Scrapoxy (sorry for censoring some infor)

4|scrapoxy | 2016-12-20T03:14:48.412Z - error: [Master] Error: request error from target (GET {{https_url}} on instance i-...@...}): { Error: connect ETIMEDOUT ...:3128
4|scrapoxy |     at Object.exports._errnoException (util.js:1022:11)
4|scrapoxy |     at exports._exceptionWithHostPort (util.js:1045:20)
4|scrapoxy |     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1087:14)
4|scrapoxy |   code: 'ETIMEDOUT',
4|scrapoxy |   errno: 'ETIMEDOUT',
4|scrapoxy |   syscall: 'connect',
4|scrapoxy |   address: '...',
4|scrapoxy |   port: 3128 }

The stats of scrapoxy at that point: screenshot 2016-12-20 10 17 41

Please advice. Thank you so much :)

nvg58 commented 7 years ago

I also got this error logs from scrapyd:

2016-12-20 03:55:51 [scrapy] ERROR: Error caught on signal handler: <bound method ?.spider_opened of <scrapoxy.downloadmiddlewares.scale.ScaleMiddleware object at 0x7f362cb9c790>>
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/twisted/internet/defer.py", line 150, in maybeDeferred
    result = f(*args, **kw)
  File "/usr/local/lib/python2.7/dist-packages/pydispatch/robustapply.py", line 55, in robustApply
    return receiver(*arguments, **named)
  File "/usr/local/lib/python2.7/dist-packages/scrapoxy/downloadmiddlewares/scale.py", line 45, in spider_opened
    min_sc, required_sc, max_sc = self._commander.get_scaling()
  File "/usr/local/lib/python2.7/dist-packages/scrapoxy/commander.py", line 73, in get_scaling
    r = requests.get(u'{0}/scaling'.format(self._api), headers=headers)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 70, in get
    return request('get', url, params=params, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/api.py", line 56, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 488, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/sessions.py", line 609, in send
    r = adapter.send(request, **kwargs)
  File "/usr/local/lib/python2.7/dist-packages/requests/adapters.py", line 487, in send
    raise ConnectionError(e, request=request)
ConnectionError: HTTPConnectionPool(host='xxx.xxx.xxx', port=8889): Max retries exceeded with url: /api/scaling (Caused by NewConnectionError('<requests.packages.urllib3.connection.HTTPConnection object at 0x7f362c2b4e90>: Failed to establish a new connection: [Errno 111] Connection refused',))
fabienvauchelles commented 7 years ago

Hello @nvg58,

The Scrapoxy instance has lost the network connection. Are you running scrapyd and scrapoxy on the same instance ?

nvg58 commented 7 years ago

Yes, I do.

fabienvauchelles commented 7 years ago

Perhaps a part of Scrapoxy has crashed. Or the remote website closes the connection. Or the instances is not accessible.

I will close the issue. Don't hesitate to reopen it if you add more infos.

nvg58 commented 7 years ago
6|scrapoxy | 2016-12-24T03:36:29.237Z - error: [Master] Error: request error from target (GET ... on instance ...): { Error: connect ETIMEDOUT ...
6|scrapoxy |     at Object.exports._errnoException (util.js:1022:11)
6|scrapoxy |     at exports._exceptionWithHostPort (util.js:1045:20)
6|scrapoxy |     at TCPConnectWrap.afterConnect [as oncomplete] (net.js:1087:14)
6|scrapoxy |   code: 'ETIMEDOUT',
6|scrapoxy |   errno: 'ETIMEDOUT',
6|scrapoxy |   syscall: 'connect',
6|scrapoxy |   address: '...',
6|scrapoxy |   port: 3128 }

When I view the logs file of scrapoxy from pm2. At first, it works well. Then after awhile, many requests got errors as above.

Please advice.

cp2587 commented 7 years ago

I have the same problem. Instance seems to work fine at first but after a while and for no reason i get

Here are some relevant logs:

2016-12-29T04:00:16.811Z - debug: [Pinger] ping: hostname=213.32.76.128 / port=3128
[...]
2016-12-29T04:00:55.743Z - debug: [Instance/076408fd-7ab8-459d-b7f3-deb883912bc2] changeAlive: true => false
2016-12-29T04:00:55.744Z - debug: [Instance/076408fd-7ab8-459d-b7f3-deb883912bc2] Instance is down. Crash timer starts
2016-12-29T04:00:55.744Z - debug: [ProviderOVHCloud] removeInstance: model= 076408fd-7ab8-459d-b7f3-deb883912bc2@213.32.76.128:3128
2016-12-29T04:00:55.831Z - debug: [Manager] adjustInstances: required:10 / actual:10
2016-12-29T04:00:55.851Z - debug: [Manager] checkInstances
[...]
2016-12-29T04:41:35.359Z - error: [Master] Error: request error from target (GET http://xxxx on instance 076408fd-7ab8-459d-b7f3-deb883912bc2@213.32.76.128:3128): { [Error: connect ETIMEDOUT 213.32.76.128:3128]
  code: 'ETIMEDOUT',
  errno: 'ETIMEDOUT',
  syscall: 'connect',
  address: '213.32.76.128',
  port: 3128 }

Hope it helps...

fabienvauchelles commented 7 years ago

Could be this problem:

=> of course, connection doesn't work.

Check the aliveInstance

And could be in the HTTP Agent pool (do we reuse a good instance) ?

etiennecrb commented 7 years ago

We have this kind of errors occuring after asking the commander to remove an instance. I think I have identified the issue thanks to our logs. In fact, the instance is removed but just before that, it's pinged and after the removal from Manager._managedInstances the ping is resolved and it's alive so instance is added in Manager._aliveInstancesMap which is used by Manager.getNextRunningInstance(). Let's see what happens:

First, the API is used to remove a banned instance.

POST /api/stop

It triggers:

09:10:12.605 [Manager] checkInstances
09:10:13.303 [Instance/4e1ac982-294a-451a-8ea6-d660cddcbf04] checkAlive: false / -
09:10:13.323 [Manager] checkInstances: remove: 4e1ac982-294a-451a-8ea6-d660cddcbf04@137.74.24.24:3128

The removal triggers:

fabienvauchelles commented 7 years ago

Hello @etiennecrb,

Your analysis is very smart πŸ’― and I think you have found the problem 🍾 🍾 🍾 .

In this case, Scrapoxy process does:

  1. Ask to remove the instance
  2. Change the alive status to false and remove the instance from the alive pool
  3. Start to remove the instance from the provider
  4. Make a valid ping to the instance, change the alive status to true and re-add the instance to the alive pool
  5. Finish to remove the instance

Instance is not in managedInstances but is in aliveInstances

I'll answer to the PR directly.

Fabien.

PS: You rock. πŸ’ͺ πŸ’ͺ πŸ’ͺ

fabienvauchelles commented 7 years ago

Close with Scrapoxy 2.3.10