Open yonggewang opened 3 years ago
Hi, is this the reason why you think FFG has a liveness problem? I have to say that what you are writing here is incorrect. It is true that c and d themselves can't be directly finalized as a target, but one of their descendants can, and thus they can be finalized as an ancestor. This does not disprove Casper FFG liveness.
thanks for the message. do you have a proof that one of the descendant of c and d will be finalized (or at least some kind of example illustration how that would be achieved)? what happens if all the validators voted for c will vote for descendant of c and all the validators who voted for d will vote for descendant of d? As I mentioned, practically, this may not be an issue. But theoretically, this could be happened. For example, if the network is partitioned.
Well, no consensus protocol can achieve liveness when there is a network partition with <2/3 honest nodes on each side, so we can leave that case aside.
I do not have a proof for liveness, but I believe this paper has a proof for probabilistic liveness: https://arxiv.org/pdf/2003.03052.pdf
thanks for the reference. I am aware of this reference (I saw it after I finished my study on Casper FFG). I think perhaps we have the same opinion on these things but interpreted them differently. That is why I said the model in the original Casper FFG is not formally defined. So different people could have different interpretation about networks and about required property.
For example, when we talk about network partition, yes, you are right. When the network is partitioned, no one can achieve consensus. But perhaps we may need the following property (whether the model address this kind of property or not, it depends):
** when the network is partitioned, no BFT protocol can reach consensus (liveness is not guaranteed during network partition but no safety violation during this period). But we hope that the network partition is not permanent. That is, after some time, the network will join again. After the network resumes to work, the BFT should continue to enjoy liveness (without deadlock)
This is actually the Type II partial synchronous network model that I mentioned in my paper. What I want to point out in the paper is that: in the case of network partition, the Casper FFG may reach some kind of deadlock. After the network resumes to be synchronous, the deadlock may not be removed. A preferred BFT model should not have this kind issue.
Regarding the probabilistic liveness proof, I did not check the details of the paper.. but it does not contradict with my argument there. Since I mentioned that the example I am trying to point out is the worst case (with small probability). thanks.
in the case of network partition, the Casper FFG may reach some kind of deadlock. So I do not think it is true that Casper FFG can reach a deadlock, in fact you can easily prove that it is possible to finalize a new state from any given state.
This of course does not prove liveness of the practical protocol, but I don't think there is a deadlock. That's why I quoted your passage, the deadlock you have found seems to be a misunderstanding of Casper FFG. As an example:
a
|
|
|
b
/ \
/ \
/ \
c d
| |
| |
| |
e f
where a
is finalized and b
is justified, and c
and d
each have 50% of the votes. This is not a deadlock, because even the validators who voted for d
can still vote for e
in the next round: the two attestations b -> d
and b->e
are not slashable, they are not a "surround" attestation; for a surround condition, one attestation has to stricly include the span of the other attestation.
So from this situation is recovery by 2/3+1 voting b->e
and then justifying a direct descendant of e
, which would finalize e
thanks for the example. "This is not a deadlock, because even the validators who voted for d can still vote for e in the next round". Agreed. the two slashing rules in https://arxiv.org/pdf/1710.09437.pdf means that "the validators who voted for d can still vote for e in the next round". Similarly, assuming e and f have different heights, "the validators who voted for c can still vote for f in the next round". Furthermore, I assume that for those who voted for b->c can still vote for b->e. In the end, we obtain super-majority votes for both b->e and b->f . This process may continue forever.
So the "deadlock" is a little mis-leading indeed and I think it is better not to call it a "deadlock". On the other hand, it may be more suitable to call it a probabilistic "liveness" as other BFT protocol calls it for asynchronous networks [perhaps this is what the other paper you mentioned is talking about... but I did not get time to check it yet, so not sure about that]
In the end, we obtain super-majority votes for both b->e and b->f . This process may continue forever.
The fork choice rule will ensure that honest validators will always follow the one that has more votes. So even if e
only has one more vote than f
at a given point, all the honest validators (even the ones previously voting for d
) would now switch to e
and thus create a supermajority.
yes, agreed. Practically, I think Casper FFG should just work fine there. I am just thinking about the worst "theoretical" case. Theoretically, if the network is not fully synchronized (some delays), even if e has more vote than f, an honest node may not see this "fact". On the other hand, since there is no penalty for voting on both sides, and if there is potential benefit to vote on both side without penalty (even if he sees that e has more vote already), an honest node may just vote on both sides (I know that this is a violation of the Byzantine bound t, but practically an honest node may have motivation to do that)...
It cannot vote for both sides at the same height, that would be a violation (b->e
and b->f
would be a slashable attestation).
"It cannot vote for both sides at the same height, that would be a violation (b->e and b->f would be a slashable attestation)." As I mentioned in the previous message "assuming e and f have different heights "
A further notice: in general academic model for the adversary, the message communication is controlled completely by the adversary. Which message to deliver first and when to deliver the message is determined by the adversary. So from an academic model (I am not assuming whether that model is good or bad for practice to model an adversary), the liveness cannot be proved probabilistically indeed. IN order to prove the probabilistically liveness in an "academically accepted model", the randomness must be independent of the adversary.. here it seems that the randomness is dependent on the message delivery order. Thus it is controlled by the adversary.
more details on these discussion could be found in the recent publication: "Yongge Wang. The adversary capabilities in practical byzantine fault tolerance. In Proc. 17th International Workshop on Security and Trust Management, STM 2021, LNCS 13075, pages 1–20, 2021." An unofficial version is available at https://eprint.iacr.org/2021/1248.pdf
Issue
Proposed Implementation
In this paper https://github.com/yonggewang/BFT-Protocol-XP , we discuss several revisions of "CBC Casper the Friendly Ghost". We hope to hear community comments on this and will be happy get more discussion. thanks!