Closed kishorenc closed 4 years ago
I noticed this when running braft as a single-node cluster on my local machine. It fires sometimes but not always when running as a single node.
Is this expected?
Some of the logs around the time the braft service became ready, but no leader event fired, in case it is useful:
I0401 17:10:19 12803 /braft/src/braft/snapshot_executor.cpp:254] node default_group:172.21.252.40:8107:8108 snapshot_load_done, last_included_index: 3 last_included_term: 2 peers: "172.21.252.40:8107:8108" I0401 17:10:19 6915 /braft/src/braft/node.cpp:571] node default_group:172.21.252.40:8107:8108 init, term: 5 last_log_id: (index=10,term=5) conf: 172.21.252.40:8107:8108 old_conf: I0401 17:10:19 6915 /braft/src/braft/node.cpp:1543] node default_group:172.21.252.40:8107:8108 term 5 start vote and grant vote self I0401 17:10:19 6915 /braft/src/braft/raft_meta.cpp:498] Saved single stable meta, path /tmp/data/state/meta term 6 votedfor 172.21.252.40:8107:8108 time: 958 I0401 17:10:19 6915 /braft/src/braft/node.cpp:1776] node default_group:172.21.252.40:8107:8108 term 6 become leader of group 172.21.252.40:8107:8108 W0401 17:10:19 6915 /braft/src/braft/node.cpp:878] node default_group:172.21.252.40:8107:8108 reset_peer need wait current conf change
No, it's not expected. on_leader_start() should be called once a new leader is elected and it replays all previous logs successfully. In your case, I'm not sure if it is called or not because: 1.the logs you provide is too short to give a conclusion 2.do you print some msg in your fsm's on_leader_start()? if you do and not see that msg long after the Node is restarted, you may need to check if the Node is stuck at applying some log(check the value of `state_machine' on web page http://{ip}:{port}/raft_stat )
by the way, do you call reset_peer for some test purpose? it will fail when Node is changing configuration(including the process of becoming leader)
you may need to check if the Node is stuck at applying some log(check the value of `state_machine' on web page http://{ip}:{port}/raft_stat )
Thank you for that hint -- that was it! I had an unapplied log.
do you call reset_peer for some test purpose?
Yes, I'm doing that when the server is started in a single-node mode (mostly for convenience when local IP changes due to DHCP allocation) and braft is not able to start because of that.
I noticed this when running braft as a single-node cluster on my local machine. It fires sometimes but not always when running as a single node.
Is this expected?
Some of the logs around the time the braft service became ready, but no leader event fired, in case it is useful: