Tencent / phxsql

A high availability MySQL cluster that guarantees data consistency between a master and slaves.
Other
2.47k stars 556 forks source link

when enter ./phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"ip1,ip2,ip3" -p 17000, another error returned. #9

Closed theseusyang closed 8 years ago

theseusyang commented 8 years ago

according to the readme introduction, after executing install.py command, enter this command, ./phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h 192.168.56.1,192.168.56.2,192.168.56.3 -p 17000

another error happens,

get master fail ret -1 connect machine 192.168.56.1 fail init svr fail, ret -1

主节点为192.168.56.1, 是安装完之后主节点没有起来 or 还是什么问题?

hzlpy commented 8 years ago

有没有执行 初始化PhxSQL1步的第ii小步,或者这一步执行时失败了。 我执行失败了,也出现了你这个提示。

mariohuang commented 8 years ago

It seems phxbinlogsvr is not listening on 192.168.56.1:17000. You can check IP and Port in phxbinlogsvr.conf or just telnet to check if it is available.

theseusyang commented 8 years ago

因为遇到的小问题太多, 同时着急使用 PHXSQL, Team 能写一个比较详细步骤的安装部署文档吗?

mariohuang commented 8 years ago

你好,可否提供一下你们的运行环境(包括gcc版本,libc版本,linux kernel版本等信息。越详细越好),我们尽力模拟一个相似的进行测试:)

theseusyang commented 8 years ago

OS: Ubuntu 5.4.0-6ubuntu1~16.04.1(offical- desktop version) 直接在 Ubuntu 官方网站下载

GCC: gcc version 5.4.0 20160609

Kernel:4.4.0-31-generic

安装的是 Ubuntu 官方的纯净版本

hzlpy commented 8 years ago

我自己写了一个文档,可以一起讨论一下 0.0 。

theseusyang commented 8 years ago

1.直接使用 root 根用户登录后, 下载 0.8版本的安装包, 放在/root/目录下, 解压后为 /root/phxsql目录. 2.拷贝tools 目录下的 etc_template 文件夹到 /root/phxsql 目录下,重命名为 etc 文件夹 3.修改 etc_template 文件夹里面的3个配置文件 3.1 修改文件路径

basedir         = /home/root/phxsql/percona.src
datadir         = /home/root/data/phxsql/percona.workspace/data

修改为

basedir         = /root/phxsql/percona.src
datadir         = /root/data/phxsql/percona.workspace/data

3.2修改 mysql 用户, 将mysql改为root my.cnf 中 user=root binary_installer.py 中 user=root

lynncui00 commented 8 years ago

@hzlpy 您好,非常感谢你的文档,我们已经将你的文档链接添加到wiki,以帮助更多的开发者,谢谢。

cjcchen commented 8 years ago

感谢文档。非常详细。

2016-09-01 19:15 GMT+08:00 ZhangLe notifications@github.com:

我自己写了一个文档 http://chuzhiyan.com/2016/09/01/%E5%AE%89%E8%A3%85-PhxSQL-%E7%AC%94%E8%AE%B0/,可以一起讨论一下 0.0 。

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/tencent-wechat/phxsql/issues/9#issuecomment-244048975, or mute the thread https://github.com/notifications/unsubscribe-auth/ADmMLhqILeUE9t0Qx2nfr9vrgZvPT5WQks5qlrPvgaJpZM4Jya5Z .

theseusyang commented 8 years ago

@mariohuang 怎么样,有测试结果吗?

mariohuang commented 8 years ago

@theseusyang 请问现在在运行时卡在什么地方了,有什么出错提示吗?

theseusyang commented 8 years ago

执行 install.py 后, 日志输出到 install all success... 执行 ./phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"xxx.xxx.xxx.xxx" -p 17000 后出现下述错误

get master fail ret -1 connect machine 192.168.56.1 fail init svr fail, ret -1

mariohuang commented 8 years ago

telnet 192.168.56.1 17000是否可以接通?

hzlpy commented 8 years ago

@theseusyang 你可以多执行这个命令几次。

root@test-db:~/phxsql/sbin# ./phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"192.168.xxx.xxx" -p 17000
get master fail ret -1003
connect machine 192.168.xxx.xxx fail
init svr fail, ret 912

Usage: ./phxbinlogsvr_tools_phxrpc [-c <config>] [-f <func>] [-v]
    -f PHXEcho -s <string>
    -f GetMasterInfoFromGlobal -h <host> -p <port>
    -f GetLastSendGtidFromGlobal -h <host> -p <port> -u<gtid>
    -f SetExportIP -h <host> -p <port>
    -f AddMember -h <host> -p <port> -m <member ip>
    -f RemoveMember -h <host> -p <port> -m <member ip>
    -f SetMySqlAdminInfo -h <host> -p <port> -u <admin username> -d <admin pwd> -U <new admin username> -D <new admin pwd>
    -f SetMySqlReplicaInfo -h <host> -p <port> -u <admin username> -d <admin pwd> -U <new replica username> -D <new replica pwd>
    -f GetMemberList -h <host> -p <port>
    -f InitBinlogSvrMaster -h <ip1,ip2,ip3(ip1 is master, others are slaves)> -p <port>

root@test-db:~/phxsql/sbin# ./phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"192.168.xxx.xxx" -p 17000
get master 192.168.xxx.xxx expire time 1472816175
machine 192.168.xxx.xxx has been set master(192.168.xxx.xxx)
theseusyang commented 8 years ago

exciting! this command is already passed?

cjcchen commented 8 years ago

The machine (ip/port) can not be connected if if it shows connect fail while initializing the PhxSQL.Please check whether the binary has been started . And if not,please use the tools to push it up. Thanks for your try.

On Sep 2, 2016, at 9:35 PM, theseus yang notifications@github.com wrote:

exciting! this command is already passed?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

xrootman commented 8 years ago

你好, 环境:centos 6.6 过程描述:各组件正常编绎,初始化完成 错误现象: ./phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"192.168.9.131,192.168.9.132,192.168.9.133" -p 17000
get master expire time 0 get master expire time 0 get master expire time 0 漫长等待后 init svr fail, ret -1 经多套环境下测试都是这个提示,请指导。

xrootman commented 8 years ago

补充错误日志信息: 日志名:phxbinlogsvr.mysql.log.ERROR.20160908-103404.6728 E0908 10:37:05.674700 6914 phx_glog.cpp:82] ^[[45;37m Showy(0): PN8phxpaxos7CleanerE::run sleep a while, max deleted instanceid 0 checkpoint instan ceid (no checkpoint) now instanceid 0 ^[[0m E0908 10:37:06.000730 6918 phx_glog.cpp:82] ^[[46;34m Process master is not inited, waiting init, doing nothing ^[[0m E0908 10:37:06.675972 6914 phx_glog.cpp:82] ^[[45;37m Showy(0): PN8phxpaxos7CleanerE::run sleep a while, max deleted instanceid 0 checkpoint instan ceid (no checkpoint) now instanceid 0 ^[[0m E0908 10:37:07.000255 6918 phx_glog.cpp:82] ^[[46;34m Process master is not inited, waiting init, doing nothing ^[[0m E0908 10:37:07.677012 6914 phx_glog.cpp:82] ^[[45;37m Showy(0): PN8phxpaxos7CleanerE::run sleep a while, max deleted instanceid 0 checkpoint instan ceid (no checkpoint) now instanceid 0 ^[[0m E0908 10:37:08.000396 6918 phx_glog.cpp:82] ^[[46;34m Process master is not inited, waiting init, doing nothing ^[[0m E0908 10:37:08.678822 6914 phx_glog.cpp:82] ^[[45;37m Showy(0): PN8phxpaxos7CleanerE::run sleep a while, max deleted instanceid 0 checkpoint instan ceid (no checkpoint) now instanceid 0 ^[[0m E0908 10:37:09.000048 6918 phx_glog.cpp:82] ^[[46;34m Process master is not inited, waiting init, doing nothing ^[[0m E0908 10:37:09.679617 6914 phx_glog.cpp:82] ^[[45;37m Showy(0): PN8phxpaxos7CleanerE::run sleep a while, max deleted instanceid 0 checkpoint instan ceid (no checkpoint) now instanceid 0 ^[[0m E0908 10:37:09.738741 6825 phx_glog.cpp:82] ^[[46;34m DoQuery mysql_query show global variables like 'gtid_executed'; done 0, ^[[0m E0908 10:37:09.738905 6825 phx_glog.cpp:82] ^[[46;34m MakeCheckPoint now time 1473302229 last check time 1473302184, interval time 216000 ^[[0m E0908 10:37:09.738935 6825 phx_glog.cpp:82] ^[[46;34m MakeCheckPoint check point check done ^[[0m E0908 10:37:09.740269 6825 phx_glog.cpp:82] ^[[46;34m CheckRunningStatus current mysql instanceid 0, binlog svr instanceid 0 ^[[0m E0908 10:37:10.000268 6918 phx_glog.cpp:82] ^[[46;34m Process master is not inited, waiting init, doing nothing ^[[0m E0908 10:37:10.680131 6914 phx_glog.cpp:82] ^[[45;37m Showy(0): PN8phxpaxos7CleanerE::run sleep a while, max deleted instanceid 0 checkpoint instan ceid (no checkpoint) now instanceid 0 ^[[0m E0908 10:37:11.000282 6918 phx_glog.cpp:82] ^[[46;34m Process master is not inited, waiting init, doing nothing ^[[0m E0908 10:37:11.680572 6914 phx_glog.cpp:82] ^[[45;37m Showy(0): PN8phxpaxos7CleanerE::run sleep a while, max deleted instanceid 0 checkpoint instan ceid (no checkpoint) now instanceid 0 ^[[0m E0908 10:37:12.000707 6918 phx_glog.cpp:82] ^[[46;34m Process master is not inited, waiting init, doing nothing ^[[0m E0908 10:37:12.682564 6914 phx_glog.cpp:82] ^[[45;37m Showy(0): PN8phxpaxos7CleanerE::run sleep a while, max deleted instanceid 0 checkpoint instan ceid (no checkpoint) now instanceid 0 ^[[0m E0908 10:37:13.001195 6918 phx_glog.cpp:82] ^[[46;34m Process master is not inited, waiting init, doing nothing ^[[0m E0908 10:37:13.684191 6914 phx_glog.cpp:82] ^[[45;37m Showy(0): PN8phxpaxos7CleanerE::run sleep a while, max deleted instanceid 0 checkpoint instan ceid (no checkpoint) now instanceid 0 ^[[0m

日志名:phxsqlproxy.mysql.log.ERROR.20160908-103403.6735
Log line format: [IWEF]mmdd hh:mm:ss.uuuuuu threadid file:line] msg E0908 10:34:03.725553 6769 phx_glog.cpp:82] ^[[46;34m ReadAgentConfig read agent config ^[[0m E0908 10:34:03.726379 6769 phx_glog.cpp:82] ^[[46;34m ReadAgentConfig load db path /apps/data/phxsql/phxbinlogsvr/event_data engine ip 192.168.9.13 1 port 6000 package name phxbinlogsvr ^[[0m

尝试在另外机器上执行: [mysql@oldboy66-23 sbin]$ ./phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"192.168.9.131,192.168.9.132,192.168.9.133" -p 17000
get master fail ret -1 connect machine 192.168.9.131 fail init svr fail, ret -1

CarbonW commented 8 years ago

首次执行./phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"10.27.171.7,10.27.171.8,10.27.171.13" -p 17000长时间等待; [mysql@devsys02prddb10 sbin]$ ./phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"10.27.171.7,10.27.171.8,10.27.171.13" -p 17000 get master expire time 0 get master expire time 0 get master expire time 0 init master 10.27.171.13 done, start to add member get master expire time 0 waiting master to be started get master expire time 0 waiting master to be started get master expire time 0 再次尝试 ./phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"10.27.171.7,10.27.171.8,10.27.171.13" -p 17000 直接报错 get master expire time 0 get master expire time 0 get master expire time 0 init svr fail, ret 912 求指导

hzlpy commented 8 years ago

重试也报这个错吗? 重试之后,如果提示 master 初始化成功了,那么还可以通过 AddMember 来添加 slave。

xrootman commented 8 years ago

执行不成功啊

cjcchen commented 8 years ago

安装流程如下: 对每一台机器执行: python2.7 install.py -i"your_inner_ip" -p 54321 -g 6000 -y 11111 -P 17000 -a 8001 -f/tmp/data/

安装成功后,请确保Phxsqlproxy, mysql, phxbinlogsvr都已运行,可执行ps -ef 查看二进制。

机器都安装完成并确保二进制运行正常后,在其中一台机器,进入phxsql/sbin目录,并执行下面命令来进行集群初始化: ./phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"ip1,ip2,ip3" -p 17000

该初始化会对集群进行两个步骤:

  1. 对集群进行初始化,并设置master
  2. 添加除master的机器到集群中

在执行期间,

  1. 如果出现,connect machine xxx.xxx.xxx.xx fail,则请检查改ip的机器上的phxsql是否正常运行
  2. 如果出现,”init master 10.27.171.13 done, start to add member“ 或者 ”init svr fail, ret 912“ 则集群已经完成第一步。
  3. 如果add ip xxx to master done,则第二步已完成。

如果中间第二步没有正常完成而异常退出,可以通过执行phxbinlogsvr_tools_phxrpc -f getmemberlist -h ip -p 17000来查看当前集群的机器。 如果某些机器没有在机器中,可通过 phxbinlogsvr_tools_phxrpc -f addmember -h ip -p 17000 -m 机器ip来添加。

2016-09-08 18:29 GMT+08:00 xrootman notifications@github.com:

执行不成功啊

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/tencent-wechat/phxsql/issues/9#issuecomment-245557575, or mute the thread https://github.com/notifications/unsubscribe-auth/ADmMLhrrS-8Crd0wMbMuJtkbUlC430hwks5qn-OEgaJpZM4Jya5Z .

CarbonW commented 8 years ago

@cjcchen 执行出现init master 10.27.171.7 done, start to add member 后死循环 get master expire time 0 waiting master to be started 执行phxbinlogsvr_tools_phxrpc -f getmemberlist -h 10.27.171.7 -p 17000结果如下: [mysql@devsys02prddb10 sbin]$ ./phxbinlogsvr_tools_phxrpc -f getmemberlist -h 10.27.171.7 -p 17000 get master expire time 0 GetMemberList fail ret -1 无法查看当前机器 执行phxbinlogsvr_tools_phxrpc -f addmember -h 10.27.171.7 -p 17000 -m 10.27.171.8 结果如下: [mysql@devsys02prddb10 sbin]$ ./phxbinlogsvr_tools_phxrpc -f addmember -h 10.27.171.7 -p 17000 -m 10.27.171.8 get master expire time 0 AddMember fail ret -1 请问接下来该如何处理。。。

CarbonW commented 8 years ago

PS: 检查发现phxsqlproxy进程不在,请问如何重启phxsqlproxy

cjcchen commented 8 years ago

重启phxsqlproxy: 进入phxsql/tools目录 , 执行python restart.py -pphxsqlproxy

2016-09-08 19:51 GMT+08:00 WangKang notifications@github.com:

PS: 检查发现phxsqlproxy进程不在,请问如何重启phxsqlproxy

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/tencent-wechat/phxsql/issues/9#issuecomment-245573679, or mute the thread https://github.com/notifications/unsubscribe-auth/ADmMLnQXMv-ovRNuPMsReKas2sFP8DXJks5qn_aogaJpZM4Jya5Z .

wedonot commented 8 years ago

在执行./phxbinlogsvr_tools_phxrpc -f InitBinlogSvrMaster -h"ip1,ip2,ip3" -p 17000后显示: get master expire time 0 get master expire time 0 经过很长时间之后显示: init svr fail, ret -202 检查进程发现phxbinlogsvr_phxrpc进程在该命令完成时被干掉了,请问这样的问题怎么解决?

CarbonW commented 8 years ago

执行python restart.py -pphxsqlproxy后发现phxsqlproxy进程依然不在 重新编译后依旧 执行strace跟踪结果如下:

13346 mmap(NULL, 10489856, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS|MAP_STACK, -1, 0 <unfinished ...>
13349 <... munmap resumed> )            = 0
13347 +++ killed by SIGILL (core dumped) +++
13352 write(1, "init pid 13352 env 0x7f146c0008c"..., 34 <unfinished ...>
13350 +++ killed by SIGILL (core dumped) +++
13346 <... mmap resumed> )              = 0x7f1479099000
13352 <... write resumed> )             = 34
13346 mprotect(0x7f1479099000, 4096, PROT_NONE <unfinished ...>
13352 mmap(NULL, 143360, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0 <unfinished ...>
13346 <... mprotect resumed> )          = 0
13352 <... mmap resumed> )              = 0x7f1479076000
13349 --- SIGILL (Illegal instruction) @ 0 (0) ---
13346 clone( <unfinished ...>
13345 +++ killed by SIGILL (core dumped) +++
13352 +++ killed by SIGILL (core dumped) +++
13346 +++ killed by SIGILL (core dumped) +++

请教该如何解决

mariohuang commented 8 years ago

@CarbonW 我们已发现该问题并fix,修复方法如下: 1、重新clone一份libco代码,覆盖third_party下的libco并执行make 2、进入phxsqlproxy目录,执行 make clean; make phxsqlproxy_phxrpc 3、检查phxsqlproxy目录下的phxsqlproxy_phxrpc的更新时间是否是刚生成的,若是,则复制其覆盖原有的原目录下的sbin/phxsqlproxy_phxrpc 4、重新执行python restart.py -pphxsqlproxy并观察是否修复。

谢谢你的反馈,欢迎继续试用:)

CarbonW commented 8 years ago

@mariohuang 执行第二步时报错

[mysql@srdsdevapp65 phxsqlproxy]$ make phxsqlproxy_phxrpc
g++ -std=c++11 -I/opt/phxsql -I/opt/phxsql/third_party/protobuf/include  -I/opt/phxsql/third_party/leveldb/include -I/opt/phxsql/third_party/glog/include  -Wall -g -fPIC -m64  -I/opt/phxsql/include -I/opt/phxsql/phxbinlogsvr/config -I/opt/phxsql/phxcomm/configparser -I/opt/phxsql/phxcomm/configparser/inih-master -I/opt/phxsql/phxcomm/configparser/inih-master/cpp -I/opt/phxsql/phxcomm/log/phxlog -I/opt/phxsql/phxcomm/utils -I/opt/phxsql/phxsqlproxy -fpermissive -Werror -shared -Wall -pipe -fPIC -Wno-deprecated -D__STDC_FORMAT_MACROS -ldl -O2 -I/opt/phxsql/include -I/opt/phxsql/percona/include -I/opt/phxsql/phxbinlogsvr/config -I/opt/phxsql/phxcomm/configparser -I/opt/phxsql/phxcomm/configparser/inih-master -I/opt/phxsql/phxcomm/configparser/inih-master/cpp -I/opt/phxsql/phxcomm/log/phxlog -I/opt/phxsql/phxcomm/utils -I/opt/phxsql/phxsqlproxy -I/opt/phxsql/phxsqlproxy/plugin/monitor -I/opt/phxsql/phxsqlproxy/plugin/requestfilter -I/opt/phxsql/third_party/colib -Wno-deprecated -Wno-deprecated -fpermissive -Werror -shared -Wall -pipe -fPIC -Wno-deprecated -D__STDC_FORMAT_MACROS -ldl -O2 -I/opt/phxsql/include -I/opt/phxsql/percona/include -I/opt/phxsql/phxbinlogsvr/config -I/opt/phxsql/phxcomm/configparser -I/opt/phxsql/phxcomm/configparser/inih-master -I/opt/phxsql/phxcomm/configparser/inih-master/cpp -I/opt/phxsql/phxcomm/log/phxlog -I/opt/phxsql/phxcomm/utils -I/opt/phxsql/phxsqlproxy -I/opt/phxsql/phxsqlproxy/plugin/monitor -I/opt/phxsql/phxsqlproxy/plugin/requestfilter -I/opt/phxsql/third_party/colib -Wno-deprecated -Wno-deprecated -fpermissive -Werror -shared -Wall -pipe -fPIC -Wno-deprecated -D__STDC_FORMAT_MACROS -ldl -O2 -I/opt/phxsql/include -I/opt/phxsql/percona/include -I/opt/phxsql/phxbinlogsvr/config -I/opt/phxsql/phxbinlogsvr/core/mysql -I/opt/phxsql/phxbinlogsvr/framework/phxrpc/client -I/opt/phxsql/phxbinlogsvr/framework/proto -I/opt/phxsql/phxbinlogsvr/framework/rpccomm -I/opt/phxsql/phxcomm/configparser -I/opt/phxsql/phxcomm/configparser/inih-master -I/opt/phxsql/phxcomm/configparser/inih-master/cpp -I/opt/phxsql/phxcomm/log/phxglog -I/opt/phxsql/phxcomm/log/phxlog -I/opt/phxsql/phxcomm/utils -I/opt/phxsql/phxsqlproxy -I/opt/phxsql/phxsqlproxy/plugin/monitor -I/opt/phxsql/phxsqlproxy/plugin/requestfilter -I/opt/phxsql/third_party/colib -I/opt/phxsql/third_party/phxrpc -I/opt/phxsql/third_party/protobuf/include -fpermissive -ldl -fpermissive -shared -Werror -Wall -pipe -fPIC -Wno-deprecated -D__STDC_FORMAT_MACROS -O2  -c -o phxsqlproxymain_phxrpc.o phxsqlproxymain_phxrpc.cpp
g++ phxsqlproxymain_phxrpc.o -o phxsqlproxy_phxrpc -L/opt/phxsql/third_party/phxpaxos/lib -L/opt/phxsql/.lib -L/opt/phxsql/third_party/protobuf/lib -L/opt/phxsql/third_party/leveldb/lib/  -L/opt/phxsql/third_party/glog/lib -L -L -L/opt/phxsql/percona/libmysql -L/opt/phxsql/third_party/colib/lib  -static-libgcc -static-libstdc++ -Wl,--no-as-needed -lphxsqlproxylib_phxrpc -lmonitor_plugin -lfreqfilter_plugin -lrequestfilter_plugin -lphxsqlproxyconfig -lphxutils -lphxglog -lphxbinlogsvrclient_phxrpc -lphxbinlogsvrclient_base -lphxbinlogconfig -lphxconfig -lconfigparser -lphxutils -lclientproto -lgtidhandler -lphxlog /opt/phxsql/third_party/colib/lib/libcolib.a /opt/phxsql/percona/libmysql/libperconaserverclient.a -ldl -lrt -lz /opt/phxsql/third_party/glog/lib/libglog.a -lpthread /opt/phxsql/third_party/protobuf/lib/libprotobuf.a /opt/phxsql/third_party/phxrpc/lib/libphxrpc.a
/usr/bin/ld: cannot find -lphxsqlproxylib_phxrpc
collect2: error: ld returned 1 exit status
make: *** [phxsqlproxy_phxrpc] Error 1

另重新clone一份完整代码重新编译,结果与之前情况相同,strace跟踪结果见附件

output.txt

mariohuang commented 8 years ago

@CarbonW 在phxsqlproxy执行编译phxsqlproxylib_phxrpc结果如何?

CarbonW commented 8 years ago

@mariohuang 通过了 之前clone的libco是2年前的 thx~:)