efficient / epaxos

http://efficient.github.io/epaxos/
Other
612 stars 134 forks source link

Error in the replica message processing in mencius.go #21

Open PasinduTennage opened 3 years ago

PasinduTennage commented 3 years ago

There is an error in design in the mencius message processing logic.

`select {

    case propose := <-r.ProposeChan:
        //got a Propose from a client
        dlog.Printf("Proposal with id %d\n", propose.CommandId)
        r.handlePropose(propose)
        break

    case skipS := <-r.skipChan:
        skip := skipS.(*menciusproto.Skip)
        //got a Skip from another replica
        dlog.Printf("Skip for instances %d-%d\n", skip.StartInstance, skip.EndInstance)
        r.handleSkip(skip)

    case prepareS := <-r.prepareChan:
        prepare := prepareS.(*menciusproto.Prepare)
        //got a Prepare message
        dlog.Printf("Received Prepare from replica %d, for instance %d\n", prepare.LeaderId, prepare.Instance)
        r.handlePrepare(prepare)
        break

    case acceptS := <-r.acceptChan:
        accept := acceptS.(*menciusproto.Accept)
        //got an Accept message
        dlog.Printf("Received Accept from replica %d, for instance %d\n", accept.LeaderId, accept.Instance)
        r.handleAccept(accept)
        break

    case commitS := <-r.commitChan:
        commit := commitS.(*menciusproto.Commit)
        //got a Commit message
        dlog.Printf("Received Commit from replica %d, for instance %d\n", commit.LeaderId, commit.Instance)
        r.handleCommit(commit)
        break

    case prepareReplyS := <-r.prepareReplyChan:
        prepareReply := prepareReplyS.(*menciusproto.PrepareReply)
        //got a Prepare reply
        dlog.Printf("Received PrepareReply for instance %d\n", prepareReply.Instance)
        r.handlePrepareReply(prepareReply)
        break

    case acceptReplyS := <-r.acceptReplyChan:
        acceptReply := acceptReplyS.(*menciusproto.AcceptReply)
        //got an Accept reply
        dlog.Printf("Received AcceptReply for instance %d\n", acceptReply.Instance)
        r.handleAcceptReply(acceptReply)
        break`

In Mencius, each node should have FIFO channels, which is correctly implemented in this implementation. However, upon receiving a message from a node, that message is pushed to a channel that is specific to that message type. Then the messages are processed in the receiver side in non-FIFO method. The following is an example where this design approach breaks safety.

Assume that there are 3 nodes; A, B and C. Node A first sends a Accept message and then later sends a Propose message. Now both these messages are received by B in the order sent by A. However, upon receiving the two messages, Node B will push these messages to two separate queues. Another thread scans each channel using a select polling mechanism.

Now there is a violation of the protocol if the Propose message is first processed by B (which is possible in this design). This is a problem in mencius because, from messages each node derives piggy backed messages, hence the order of processing messages should be strictly similar to the sender's order.

A fix for this would be to have a single channel for each type of replica messages.

Thanks