binux / pyspider

A Powerful Spider(Web Crawler) System in Python.
http://docs.pyspider.org/
Apache License 2.0
16.46k stars 3.69k forks source link

[RESOLVED] webui: connect to scheduler rpc error: error(111, 'Connection refused') #771

Open aleksas opened 6 years ago

aleksas commented 6 years ago

See solution in comment.

Expected behavior

Webui connects to scheduler

Actual behavior

Updated virtual machine with pyspider running on debian Jessie and got into trouble. Webui doesn't connect and raises error:

[W 180305 19:33:32 index:108] connect to scheduler rpc error: error(111, 'Connection refused')

Also following error is shown if pyspider -c config.json scheduler is explicitly executed.

[I 180306 09:24:41 scheduler:647] scheduler starting...
Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/usr/lib/python2.7/threading.py", line 763, in run
    self.__target(*self.__args, **self.__kwargs)
  File "/usr/local/lib/python2.7/dist-packages/pyspider/scheduler/scheduler.py", line 780, in xmlrpc_run
    self.xmlrpc_server = tornado.httpserver.HTTPServer(container, io_loop=self.xmlrpc_ioloop)
  File "/usr/local/lib/python2.7/dist-packages/tornado/util.py", line 312, in __new__
    instance.initialize(*args, **init_kwargs)
TypeError: initialize() got an unexpected keyword argument 'io_loop'

... normal output follows, listing projects and stats

How to reproduce


/opt/.pyspider/config.json

{
  "taskdb": "mysql+taskdb://someusername:somepassword@localhost:3306/taskdb",
  "resultdb": "mysql+resultdb://someusername:somepassword@localhost:3306/resultdb",
  "message_queue": "amqp://someusername:somepassword@localhost:5672/%2F",

  "webui": {
    "cdn": "//cdnjs.cloudflare.com/ajax/libs/",
    "port":80,
    "username": "someusername",
    "password": "somepassword",
    "need-auth": false
  }
}

/etc/init.d/pyspider

### BEGIN INIT INFO
# Provides:          pyspider
# Required-Start:    $remote_fs $network $syslog rabbitmq-server mysql
# Required-Stop:     $remote_fs $network $syslog rabbitmq-server mysql
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Start pyspider daemon at boot time
# Description:       Enable pyspider service provided by daemon.
### END INIT INFO

# Some things that run always
touch /var/lock/pyspider

# Carry out specific functions when asked to by the system
case "$1" in
  start)
    echo "Starting PySpider"
    cd /opt/.pyspider

    # start **only one** scheduler instance
    pyspider -c config.json scheduler 2> scheduler.2.log > scheduler.log &

    # phantomjs
    pyspider -c config.json phantomjs 2> phantomjs.2.log >  phantomjs.log &

    # start fetcher / processor / result_worker instances as many as your needs
    pyspider -c config.json --phantomjs-proxy="localhost:25555" fetcher 2> fetcher.2.log > fetcher.log &
    pyspider -c config.json processor 2> processor.2.log > processor.log &
    pyspider -c config.json result_worker 2> result_worker.2.log > result_worker.log &

    # start webui, set `--scheduler-rpc` if scheduler is not running on the same host as webui
    pyspider -c config.json webui 2> webui.2.log > webui.log &

    ;;
  stop)
    echo "Stopping PySpider"
    (pgrep pyspider | xargs kill; pkill -f phantomjs; pgrep phantomjs | xargs kill; service rabbitmq-server stop)
    ;;
  *)
    echo "Usage: /etc/init.d/pyspider {start|stop}"
    exit 1
    ;;
esac

exit 0

rabbitmqctl status

Status of node rabbit@pyspider ...
[{pid,418},
{running_applications,
[{rabbit,"RabbitMQ","3.7.3"},
{rabbit_common,
"Modules shared by rabbitmq-server and rabbitmq-erlang-client",
"3.7.3"},
{xmerl,"XML parser","1.3.16"},
{ranch_proxy_protocol,"Ranch Proxy Protocol Transport","1.4.4"},
{ranch,"Socket acceptor pool for TCP protocols.","1.4.0"},
{ssl,"Erlang/OTP SSL application","8.2.3"},
{public_key,"Public key infrastructure","1.5.2"},
{crypto,"CRYPTO","4.2"},
{asn1,"The Erlang ASN1 compiler version 5.0.4","5.0.4"},
{mnesia,"MNESIA  CXC 138 12","4.15.3"},
{jsx,"a streaming, evented json parsing toolkit","2.8.2"},
{os_mon,"CPO  CXC 138 46","2.4.4"},
{inets,"INETS  CXC 138 49","6.4.5"},
{recon,"Diagnostic tools for production use","2.3.2"},
{lager,"Erlang logging framework","3.5.1"},
{goldrush,"Erlang event stream processor","0.1.9"},
{compiler,"ERTS  CXC 138 10","7.1.4"},
{syntax_tools,"Syntax tools","2.1.4"},
{sasl,"SASL  CXC 138 11","3.1.1"},
{stdlib,"ERTS  CXC 138 10","3.4.3"},
{kernel,"ERTS  CXC 138 10","5.4.1"}]},
{os,{unix,linux}},
{erlang_version,
"Erlang/OTP 20 [erts-9.2] [source] [64-bit] [smp:1:1] [ds:1:1:10] [async-th                                                                                                                                                             reads:64] [hipe] [kernel-poll:true]\n"},
{memory,
[{connection_readers,270192},
{connection_writers,63056},
{connection_channels,152112},
{connection_other,490152},
{queue_procs,282688},
{queue_slave_procs,0},
{plugins,5864},
{other_proc,24010152},
{metrics,217792},
{mgmt_db,0},
{mnesia,83368},
{other_ets,1873712},
{binary,34221544},
{msg_index,28912},
{code,24915284},
{atom,1041593},
{other_system,9198923},
{allocated_unused,12606160},
{reserved_unallocated,0},
{strategy,rss},
{total,[{erlang,96855344},{rss,71524352},{allocated,109461504}]}]},
{alarms,[]},
{listeners,[{clustering,25672,"::"},{amqp,5672,"::"}]},
{vm_memory_calculation_strategy,rss},
{vm_memory_high_watermark,0.4},
{vm_memory_limit,630929817},
{disk_free_limit,50000000},
{disk_free,25049255936},
{file_descriptors,
[{total_limit,924},
{total_used,15},
{sockets_limit,829},
{sockets_used,13}]},
{processes,[{limit,1048576},{used,327}]},
{run_queue,0},
{uptime,83},
{kernel,{net_ticktime,60}}]
aleksas commented 6 years ago

Resolved by downgrading Tornado to 4.5 version from latest (5.0).

Apparently Tornado 5.0 API is incosnsistent with previous versions.

Faulty code (in Tornado 5.0 case): https://github.com/binux/pyspider/blob/edd46135621dd333283850c2bfdb24444c3eec06/pyspider/scheduler/scheduler.py#L780

Related to Issue. Related to pull.

Ronales commented 6 years ago

I got the same problem