Tencent / phxpaxos

The Paxos library implemented in C++ that has been used in the WeChat production environment.
Other
3.36k stars 863 forks source link

checkpoint 接收新的状态机 paxos_log日志,重建index后, 在下次重启前以及checkpoint文件传输前,propose超时定时器超时,写入llstance到paxos_log #196

Open huangyu6572 opened 2 years ago

huangyu6572 commented 2 years ago

checkpoint 接收新的状态机流程 //清空paxos_log 日志 I0630 09:04:03.638077 5242 logger_google.cpp:111]  Showy(0): PN8phxpaxos8InstanceE::OnReceiveCheckpointMsg Now.InstanceID 737 MsgType 1 Msg.from_nodeid 72058139498785641 My.nodeid 72058139498785639 flag 1 uuid 72058141452437945 sequence 0 checksum 0 offset 0 buffsize 0 filepath  W0630 09:04:03.638098 5242 logger_google.cpp:103]  Imp(0): PN8phxpaxos7LearnerE::OnSendCheckpoint START uuid 72058141452437945 flag 1 sequence 0 cpi 759 checksum 0 smid 0 offset 0 buffsize 0 filepath  W0630 09:04:03.638497 5242 logger_google.cpp:103]  Imp(0): PN8phxpaxos18CheckpointReceiverE::ClearCheckpointTmp rm dir ./storage/paxoslog_0/g0/cp_tmp_1 done!  I0630 09:04:03.638594 5242 logger_google.cpp:111]  DEBUG(0): PN8phxpaxos8DatabaseE::GetFromLevelDB LevelDB.Get not found, instanceid 18446744073709551614  W0630 09:04:03.644414 5242 logger_google.cpp:103]  Imp(0): PN8phxpaxos8LogStoreE::RebuildIndex START fileid 0 offset 0 checksum 0  I0630 09:04:03.644502 5242 logger_google.cpp:111]  DEBUG(0): PN8phxpaxos8LogStoreE::RebuildIndexForOneFile file not exist, filepath ./storage/paxoslog_0/g0/vfile/0.f 

I0630 09:04:03.644512 5242 logger_google.cpp:111]  Showy(0): PN8phxpaxos8LogStoreE::RebuildIndex END rebuild ok, nowfileid 0  I0630 09:04:03.644532 5242 logger_google.cpp:111]  Showy(0): PN8phxpaxos8LogStoreE::OpenFile ok, path ./storage/paxoslog_0/g0/vfile/0.f 

//很早之前的 提议的超时计数器 ,又把新的最大值写入了! W0630 09:04:05.859681 5242 logger_google.cpp:103]  Imp(0): PN8phxpaxos8AcceptorE::OnPrepare START Msg.InstanceID 737 Msg.from_nodeid 72058139498785639 Msg.ProposalID 215  I0630 09:04:05.859699 5242 logger_google.cpp:111]  DEBUG(0): PN8phxpaxos8AcceptorE::OnPrepare [Promise] State.PromiseID 215 State.PromiseNodeID 72058139498785639 State.PreAcceptedID 214 State.PreAcceptedNodeID 72058139498785641 

//触发了paxos_log 日志的写入,有可能 导致下次启动时llCPInstanceID <= llLogMaxInstanceID I0630 09:04:05.864318 5242 logger_google.cpp:111]  Showy(0): PN8phxpaxos8LogStoreE::Append ok, offset 0 fileid 0 checksum 3803452448 instanceid 737 buffer size 67 usetime 4ms sync 1  I0630 09:04:05.864400 5242 logger_google.cpp:111]  Showy(0): PN8phxpaxos13AcceptorStateE::Persist GroupIdx 0 InstanceID 737 PromiseID 215 PromiseNodeID 72058139498785639 AccectpedID 214 AcceptedNodeID 72058139498785641 ValueLen 30 Checksum 2622589103  W0630 09:04:05.864413 5242 logger_google.cpp:103]  Imp(0): PN8phxpaxos8AcceptorE::OnPrepare END Now.InstanceID 737 ReplyNodeID 72058139498785639 

//数据传输完成 以及 自杀 W0630 09:04:13.641677 5242 logger_google.cpp:103]  Imp(0): PN8phxpaxos7LearnerE::OnSendCheckpoint START uuid 72058141452437945 flag 3 sequence 9 cpi 759 checksum 0 smid 0 offset 0 buffsize 0 filepath  I0630 09:04:13.642592 5242 logger_google.cpp:111]  Showy(0): PN8phxpaxos7LearnerE::OnSendCheckpoint_End All sm load state ok, start to exit process 

// 下次重启触发,导致run paxos失败 PLGErr("checkpoint instanceid %lu larger than log max instanceid %lu. " "Please ensure that your checkpoint data is correct. " "If you ensure that, just delete all paxos log data and restart.", llCPInstanceID, llLogMaxInstanceID);

huangyu6572 commented 2 years ago

https://github.com/Tencent/phxpaxos/issues/158

huangyu6572 commented 2 years ago

instance.cpp +504 if (oPaxosMsg.msgtype() == MsgType_PaxosPrepare || oPaxosMsg.msgtype() == MsgType_PaxosAccept) { //add if (m_oCheckpointMgr.InAskforcheckpointMode()) { PLGImp("in ask for checkpoint mode, ignord paxosmsg"); return 0; }