XiaoMi / minos

Minos is beyond a hadoop deployment system.
Apache License 2.0
522 stars 200 forks source link

Setting up a ZooKeeper Cluster,Bootstrap the cluster fail #23

Open xiaoguanyu opened 10 years ago

xiaoguanyu commented 10 years ago

Your password is: 123456, you should store this in a safe place, because this is the verification code used to do cleanup Bootstrapping task 0 of zookeeper on 192.38.11.59(0) Bootstrap task 0 of zookeeper on 192.38.11.59(0) fail: No package found on package server of zookeeper Bootstrap task 0 of zookeeper on 192.38.11.59(0) fail: 2 Starting task 0 of zookeeper on 192.38.11.59(0) Start task 0 of zookeeper on 192.38.11.59(0) fail: You should bootstrap the job first

wuzesheng commented 10 years ago

This seems that you haven't install the zookeeper package on tank server.

xiaoguanyu commented 10 years ago

installed the zookeeper package on tank server,but alert a new err Start task 0 of zookeeper on 10.38.11.59(0) fail: <Fault 60: 'ALREADY_STARTED: zookeeper--dptst--zookeeper'> ...... File "/usr/local/lib/python2.7/socket.py", line 571, in create_connection raise err socket.error: [Errno 111] Connection refused

wuzesheng commented 10 years ago

Can you post the detailed stack trace?

xiaoguanyu commented 10 years ago

2014-05-14 13:26:49 You should set a bootstrap password, it will be requried when you do cleanup Set a password manually? (y/n) y Please input your password: 2014-05-14 13:26:52 Your password is: 123456, you should store this in a safe place, because this is the verification code used to do cleanup 2014-05-14 13:26:52 Bootstrapping task 0 of zookeeper on 10.38.11.59(0) 2014-05-14 13:26:53 Bootstrap task 0 of zookeeper on 10.38.11.59(0) success 2014-05-14 13:26:53 Starting task 0 of zookeeper on 10.38.11.59(0) 2014-05-14 13:26:53 Start task 0 of zookeeper on 10.38.11.59(0) fail: <Fault 60: 'ALREADY_STARTED: zookeeper--dptst--zookeeper'> Traceback (most recent call last): File "/usr/local/test/minos/client/deploy.py", line 284, in main() File "/usr/local/test/minos/client/deploy.py", line 281, in main return args.handler(args) File "/usr/local/test/minos/client/deploy.py", line 229, in process_command_bootstrap return deploy_tool.bootstrap(args) File "/usr/local/test/minos/client/deploy_zookeeper.py", line 154, in bootstrap bootstrap_job(args, hosts[host_id].ip, "zookeeper", host_id, instance_id, cleanup_token) File "/usr/local/test/minos/client/deploy_zookeeper.py", line 136, in bootstrap_job args.zookeeper_config.parse_generated_config_files(args, job_name, host_id, instance_id) File "/usr/local/test/minos/client/service_config.py", line 665, in parse_generated_config_files args, self.cluster, self.jobs, current_job, host_id, instance_id)) File "/usr/local/test/minos/client/service_config.py", line 652, in parse_generated_files file_dict[key] = ServiceConfig.parse_item(args, cluster, jobs, current_job, host_id, instance_id, value) File "/usr/local/test/minos/client/service_config.py", line 596, in parse_item new_item.append(callback(args, cluster, jobs, current_job, host_id, instance_id, reg_expr[iter])) File "/usr/local/test/minos/client/service_config.py", line 255, in get_section_attribute return get_specific_dir(host.ip, args.service, cluster.name, section, section_instance_id, attribute) File "/usr/local/test/minos/client/service_config.py", line 183, in get_specific_dir return supervisor_client.get_available_data_dirs()[0] File "/usr/local/test/minos/client/supervisor_client.py", line 26, in get_available_data_dirs self.cluster, self.job) File "/usr/local/lib/python2.7/xmlrpclib.py", line 1224, in call return self.send(self.name, args) File "/usr/local/lib/python2.7/xmlrpclib.py", line 1578, in request verbose=self.verbose File "/usr/local/lib/python2.7/xmlrpclib.py", line 1264, in request return self.single_request(host, handler, request_body, verbose) File "/usr/local/lib/python2.7/xmlrpclib.py", line 1292, in single_request self.send_content(h, request_body) File "/usr/local/lib/python2.7/xmlrpclib.py", line 1439, in send_content connection.endheaders(request_body) File "/usr/local/lib/python2.7/httplib.py", line 969, in endheaders self._send_output(message_body) File "/usr/local/lib/python2.7/httplib.py", line 829, in _send_output self.send(msg) File "/usr/local/lib/python2.7/httplib.py", line 791, in send self.connect() File "/usr/local/lib/python2.7/httplib.py", line 772, in connect self.timeout, self.source_address) File "/usr/local/lib/python2.7/socket.py", line 571, in create_connection raise err socket.error: [Errno 111] Connection refused

wuzesheng commented 10 years ago

This seems that the minos client can't connect to your supervisord Can you check that whether your supervisord is started normally or not?

xiaoguanyu commented 10 years ago

My supervisord is started normally,and can view components work status by http://192.169.11.59:9001.

yehangjun commented 10 years ago

From your error message, the client is trying to connect another ip, check that one?

Bootstrapping task 0 of zookeeper on 192.38.11.59(0) Bootstrap task 0 of zookeeper on 192.38.11.59(0) fail: No package found on package server of zookeeper Bootstrap task 0 of zookeeper on 192.38.11.59(0) fail: 2 Starting task 0 of zookeeper on 192.38.11.59(0)

xiaoguanyu commented 10 years ago

thank you yehangjun ,the problem is solved,because haven't install the zookeeper package on tank server. cd minos/client
./deploy install zookeeper dptst

Bootstrapping task 0 of zookeeper on 192.38.11.59(0) Bootstrap task 0 of zookeeper on 192.38.11.59(0) fail: No package found on package server of zookeeper Bootstrap task 0 of zookeeper on 192.38.11.59(0) fail: 2 Starting task 0 of zookeeper on 192.38.11.59(0)

wuzesheng commented 10 years ago

@xiaoguanyu What is the root cause of the 'connection refused' error?

lvzhaoxing commented 10 years ago

我也遇到了类似的情况:

[root@master client]# ./deploy bootstrap zookeeper dptst
2014-10-15 17:29:57 You should set a bootstrap password, it will be requried when you do cleanup
Set a password manually? (y/n) y
Please input your password: 
2014-10-15 17:30:03 Your password is: ir2014, you should store this in a safe place, because this is the verification code used to do cleanup
2014-10-15 17:30:03 Bootstrapping task 0 of zookeeper on 10.161.156.199(0)
2014-10-15 17:30:07 Bootstrap task 0 of zookeeper on 10.161.156.199(0) success
2014-10-15 17:30:07 Starting task 0 of zookeeper on 10.161.156.199(0)
2014-10-15 17:30:07 Start task 0 of zookeeper on 10.161.156.199(0) success
Traceback (most recent call last):
  File "/root/minos/client/deploy.py", line 288, in <module>
    main()
  File "/root/minos/client/deploy.py", line 285, in main
    return args.handler(args)
  File "/root/minos/client/deploy.py", line 233, in process_command_bootstrap
    return deploy_tool.bootstrap(args)
  File "/root/minos/client/deploy_zookeeper.py", line 154, in bootstrap
    bootstrap_job(args, hosts[host_id].ip, "zookeeper", host_id, instance_id, cleanup_token)
  File "/root/minos/client/deploy_zookeeper.py", line 136, in bootstrap_job
    args.zookeeper_config.parse_generated_config_files(args, job_name, host_id, instance_id)
  File "/root/minos/client/service_config.py", line 693, in parse_generated_config_files
    args, self.service, self.cluster, self.jobs, current_job, host_id, instance_id))
  File "/root/minos/client/service_config.py", line 681, in parse_generated_files
    parsing_service, current_job, host_id, instance_id, value)
  File "/root/minos/client/service_config.py", line 622, in parse_item
    current_job, host_id, instance_id, reg_expr[iter]))
  File "/root/minos/client/service_config.py", line 274, in get_section_attribute
    section, section_instance_id, attribute)
  File "/root/minos/client/service_config.py", line 195, in get_specific_dir
    return supervisor_client.get_available_data_dirs()[0]
  File "/root/minos/client/supervisor_client.py", line 26, in get_available_data_dirs
    self.cluster, self.job)
  File "/usr/local/python2.7/lib/python2.7/xmlrpclib.py", line 1224, in __call__
    return self.__send(self.__name, args)
  File "/usr/local/python2.7/lib/python2.7/xmlrpclib.py", line 1578, in __request
    verbose=self.__verbose
  File "/usr/local/python2.7/lib/python2.7/xmlrpclib.py", line 1264, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/local/python2.7/lib/python2.7/xmlrpclib.py", line 1292, in single_request
    self.send_content(h, request_body)
  File "/usr/local/python2.7/lib/python2.7/xmlrpclib.py", line 1439, in send_content
    connection.endheaders(request_body)
  File "/usr/local/python2.7/lib/python2.7/httplib.py", line 991, in endheaders
    self._send_output(message_body)
  File "/usr/local/python2.7/lib/python2.7/httplib.py", line 844, in _send_output
    self.send(msg)
  File "/usr/local/python2.7/lib/python2.7/httplib.py", line 806, in send
    self.connect()
  File "/usr/local/python2.7/lib/python2.7/httplib.py", line 787, in connect
    self.timeout, self.source_address)
  File "/usr/local/python2.7/lib/python2.7/socket.py", line 571, in create_connection
    raise err
socket.error: [Errno 111] Connection refused
lvzhaoxing commented 10 years ago

不过,我的tank已经上传了zookeeper的包了。

ID Package Name Revision No. Timestamp Checksum Download 1 zookeeper-3.4.6.tar.gz r12345 20141015-172923 2a9e53f5990dfe0965834a525fbcad226bf93474 Download

wuzesheng commented 10 years ago

看上去,你的第一台布成功了,第二台在连接supervisord的时候没连上,connection refused,应该是对应机器上的supervisord没启来吧,你检查一下?

lvzhaoxing commented 10 years ago

我部署了3台机器,发现三台的9001都能访问,但是三台的zookeeper的process都启动失败。supervisor页面的内容均如下:

State   Description Name    Action
running
pid 21263, uptime 0:06:29   crashmailbatch-monitor  Restart Stop Clear Log Tail -f
running
pid 21262, uptime 0:06:29   processexit-monitor Restart Stop Clear Log Tail -f
fatal
Exited too quickly (process log may have details)   zookeeper--dptst--zookeeper Start Clear Log Tail -f

PS:9001的页面出来了,supervisord有可能还没启动吗?

wuzesheng commented 10 years ago

9001页面成功的话,应该supervisor就是启动成功了。 process启动失败报的啥错,也是connection refused吗?

lvzhaoxing commented 10 years ago

恩,./deploy bootstrap zookeeper dptst命令的结果也是connection refused。

wuzesheng commented 10 years ago

你在运行客户端的机器上,wget http://$host:9001 这个页面,看看能不能正常访问

lvzhaoxing commented 10 years ago

是,我的错,要把所有的机器都先部署上supervisor,再运行./deploy bootstrap zookeeper dptst。现在正常了。 不过新的问题又来了。./deploy show zookeeper dptst 出现错误。

2014-10-16 09:44:01 Showing task 0 of zookeeper on 10.161.156.199(0)
2014-10-16 09:44:01 Task 0 of zookeeper on 10.161.156.199(0) is FATAL
2014-10-16 09:44:01 Showing task 1 of zookeeper on 10.162.20.204(0)
2014-10-16 09:44:01 Task 1 of zookeeper on 10.162.20.204(0) is FATAL
2014-10-16 09:44:01 Showing task 2 of zookeeper on 10.161.131.193(0)
2014-10-16 09:44:01 Task 2 of zookeeper on 10.161.131.193(0) is FATAL
wuzesheng commented 10 years ago

这个是zookeeper没正常起来,你查看一下zookeeper的log吧,看看是什么原因没启动

lvzhaoxing commented 10 years ago

是/home/root/log/zookeeper/dptst/zookeeper目录吗?都是空目录

[root@slave1 ~]# cd  /home/root/log/zookeeper/dptst/zookeeper
[root@slave1 zookeeper]# ll
总用量 0
wuzesheng commented 10 years ago

到/home/root/app/zookeeper/dptst/zookeeper下, stdout/ 下面有标准输出重定向的文件,看看里面有没有什么错误信息?

lvzhaoxing commented 10 years ago

问题找到了,应该是java的为问题。 java是我手动安装的,装在/usr/local/jdk1.7.0_67/,/etc/profile里的环境变量也配了,直接运行java -version也正常。 但是查看stdout里的记录,zookeeper还是去/usr/bin/java查找,minos的javahome要去那里配置?

lvzhaoxing commented 10 years ago

弱弱地再问一个问题,看完pdf和wiki,minos的安装方式是不需要配置ssh免密码登录,全部靠supervisior是吧?

wuzesheng commented 10 years ago
  1. minos是读当前用户的JAVA_HOME环境变量,没有特殊配置
  2. 恩,你的理解是对的,不依赖ssh
lvzhaoxing commented 10 years ago

实在搞不明白,只好链接过去了:ln -s $JAVA_HOME/bin/java /usr/bin/java
算是马马虎虎搞定了。现在zookeeper正常了

wuzesheng commented 10 years ago

赞,搞起来了就好。 minos中,java_home的获取在start.sh里有,你可以看一下源码。