binux / pyspider

A Powerful Spider(Web Crawler) System in Python.
http://docs.pyspider.org/
Apache License 2.0
16.48k stars 3.69k forks source link

Fixing "Exception: HTTP 599: Failed to connect to localhost port 25555: Connection refused" #943

Closed aleksas closed 4 years ago

aleksas commented 4 years ago

The problem was that pyspider -c config.json --phantomjs-proxy="localhost:25555" fetcher 2> fetcher.2.log > fetcher.log & fetcher was crashing after a day or two of crawling with following exceptrion:

[E 200224 09:12:30 base_handler:203] HTTP 599: Failed to connect to localhost port 25555: Connection refused
   Traceback (most recent call last):
     File "/usr/local/lib/python2.7/dist-packages/pyspider/libs/base_handler.py", line 196, in run_task
       result = self._run_task(task, response)
     File "/usr/local/lib/python2.7/dist-packages/pyspider/libs/base_handler.py", line 175, in _run_task
       response.raise_for_status()
     File "/usr/local/lib/python2.7/dist-packages/pyspider/libs/response.py", line 172, in raise_for_status
       six.reraise(Exception, Exception(self.error), Traceback.from_string(self.traceback).as_traceback())
     File "/usr/local/lib/python2.7/dist-packages/pyspider/fetcher/tornado_fetcher.py", line 499, in phantomjs_fetch
       response = yield gen.maybe_future(self.http_client.fetch(request))
     File "/usr/local/lib/python2.7/dist-packages/tornado/httpclient.py", line 102, in fetch
       self._async_client.fetch, request, **kwargs))
     File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 458, in run_sync
       return future_cell[0].result()
     File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 238, in result
       raise_exc_info(self._exc_info)
     File "<string>", line 3, in raise_exc_info
   Exception: HTTP 599: Failed to connect to localhost port 25555: Connection refused

Solution was to override default parameters by adding "phantomjs" params to config.json. Apparently fetcher was not autorestarting.

{
  "taskdb": "mysql+taskdb://pyspider:pyspider@localhost:3306/taskdb",
  "resultdb": "mysql+resultdb://pyspider:pyspider@localhost:3306/resultdb",
  "message_queue": "amqp://pyspider:pyspider@localhost:5672/%2F",
  "phantomjs-proxy": "localhost:25555",

  "phantomjs": {
    "phantomjs-path": "phantomjs",
    "port":25555,
    "auto-restart": true
  },
  "webui": {
    "cdn": "//cdnjs.cloudflare.com/ajax/libs/",
    "port":80,
    "username": "******",
    "password": "******",
    "need-auth": true
  }
}

Environment

OS

pyspider