The problem was that
pyspider -c config.json --phantomjs-proxy="localhost:25555" fetcher 2> fetcher.2.log > fetcher.log & fetcher was crashing after a day or two of crawling with following exceptrion:
[E 200224 09:12:30 base_handler:203] HTTP 599: Failed to connect to localhost port 25555: Connection refused
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/pyspider/libs/base_handler.py", line 196, in run_task
result = self._run_task(task, response)
File "/usr/local/lib/python2.7/dist-packages/pyspider/libs/base_handler.py", line 175, in _run_task
response.raise_for_status()
File "/usr/local/lib/python2.7/dist-packages/pyspider/libs/response.py", line 172, in raise_for_status
six.reraise(Exception, Exception(self.error), Traceback.from_string(self.traceback).as_traceback())
File "/usr/local/lib/python2.7/dist-packages/pyspider/fetcher/tornado_fetcher.py", line 499, in phantomjs_fetch
response = yield gen.maybe_future(self.http_client.fetch(request))
File "/usr/local/lib/python2.7/dist-packages/tornado/httpclient.py", line 102, in fetch
self._async_client.fetch, request, **kwargs))
File "/usr/local/lib/python2.7/dist-packages/tornado/ioloop.py", line 458, in run_sync
return future_cell[0].result()
File "/usr/local/lib/python2.7/dist-packages/tornado/concurrent.py", line 238, in result
raise_exc_info(self._exc_info)
File "<string>", line 3, in raise_exc_info
Exception: HTTP 599: Failed to connect to localhost port 25555: Connection refused
Solution was to override default parameters by adding "phantomjs" params to config.json.
Apparently fetcher was not autorestarting.
The problem was that
pyspider -c config.json --phantomjs-proxy="localhost:25555" fetcher 2> fetcher.2.log > fetcher.log &
fetcher was crashing after a day or two of crawling with following exceptrion:Solution was to override default parameters by adding "phantomjs" params to config.json. Apparently fetcher was not autorestarting.
Environment
OS
pyspider