baidu / braft

An industrial-grade C++ implementation of RAFT consensus algorithm based on brpc, widely used inside Baidu to build highly-available distributed systems.
Apache License 2.0
3.93k stars 879 forks source link

怎样在两台主机上运行 counter server #10

Closed ghost closed 6 years ago

ghost commented 6 years ago

在两天机器上运行 counter server, 报错了 第一台虚拟机 ip 192.168.109.129 报错信息 :

node Counter:127.0.0.1:8100:0 term 1 start pre_vote W0321 15:14:56.005779 13732 /home/zhang/github/braft/src/braft/node.cpp:1305] node Counter:127.0.0.1:8100:0 can't do pre_vote as it is not in 192.168.109.128:8100:0

第二台虚拟机 ip 192.168.109.128 报错信息 : node Counter:127.0.0.1:8100:0 term 1 start pre_vote W0321 15:15:02.566098 12401 /home/zhang/github/braft/src/braft/node.cpp:1305] node Counter:127.0.0.1:8100:0 can't do pre_vote as it is not in 192.168.109.129:8100:0

run_server.sh : DEFINE_string crash_on_fatal 'true' 'Crash on fatal log' DEFINE_integer bthread_concurrency '18' 'Number of worker pthreads' DEFINE_string sync 'true' 'fsync each time' DEFINE_string valgrind 'false' 'Run in valgrind' DEFINE_integer max_segment_size '8388608' 'Max segment size' DEFINE_integer server_num '1' 'Number of servers' DEFINE_boolean clean 1 'Remove old "runtime" dir before running' DEFINE_integer port 8100 "Port of the first server"

parse the command-line

FLAGS "$@" || exit 1 eval set -- "${FLAGS_ARGV}"

The alias for printing to stderr

alias error=">&2 echo counter: "

hostname prefers ipv6

IP=hostname -i | awk '{print $NF}'

IP=127.0.0.1

if [ "$FLAGS_valgrind" == "true" ] && [ $(which valgrind) ] ; then VALGRIND="valgrind --tool=memcheck --leak-check=full" fi

IP2=192.168.109.128 raft_peers="" for ((i=0; i<$FLAGS_server_num; ++i)); do raft_peers="${raft_peers}${IP2}:$((${FLAGS_port}+i)):0," done

chenzhangyi commented 6 years ago

看下你们的环境中为什么butil::my_ip()获取的是的localhost吧,看下是不是 /etc/resolve.conf的问题

ghost commented 6 years ago

已经修改了hosts,butil::my_ip获取正常了,但还有如下问题,还请帮忙看看:

/server.cpp:391] Counter service is running on 0.0.0.0:8100 I0321 16:40:57.419647 3044 /home/zhang/github/braft/src/braft/node.cpp:1294] node Counter:192.168.109.128:8100:0 term 1 start pre_vote W0321 16:40:57.419697 3044 /home/zhang/github/braft/src/braft/node.cpp:1305] node Counter:192.168.109.128:8100:0 can't do pre_vote as it is not in 192.168.109.129:8100:0

chenzhangyi commented 6 years ago

第二条日志是正常的,说明这个节点还没被加入到复制主中, 你需要用raft_cli把第二个节点加进来。

ghost commented 6 years ago

这个复制主从启动两个counter server的时候 都没法选取投票,没有Leader, braft_cli添加错误 ./braft_cli add_peer --group=Counter --peer=192.168.109.128:8100 --new_peers=192.168.109.129:8100 --conf=192.168.109.128:8100:0 E0321 17:07:21.438734 3804 /home/zhang/github/braft/tools/braft_cli.cpp:59] Fail to add_peer : Fail to get leader of group Counter, [192.168.109.128:8100] [E11][192.168.109.128:8100][E11]Unknown leader

ghost commented 6 years ago

或者说这个主从添加是在server.cpp里面 启动之前的时候添加的?能否有样例参考下,目前这个braft资料太少了

chenzhangyi commented 6 years ago

从日志上看, 你应该操作的是192.168.109.129:8100:0这个节点。 --conf填这个试下

ghost commented 6 years ago

我的是两台机器,填一个的话,两台机器各自成为独立的Node leader了, --conf 填 两台机器的就可以正常选主,192.168.109.128:8100:0,192.168.109.129:8100:0 ,但是遇到一个现象,我Kill掉 node leader,另外一台就无法选主,然后集群没法工作了, 集群要正常工作 是不是 最少 2台机器可用才行 ?

chenzhangyi commented 6 years ago

可以先了解下 https://github.com/brpc/braft/blob/master/docs/cn/server.md#构造braftnode 关于启动参数的设置. 两个节点的quorum是2, 任何一个挂掉都会导致不可用. 通过quorum-based的系统正常最少副本数是3

chenzhangyi commented 6 years ago

Reopen this if there're further issues.