Closed kostyantyn closed 5 years ago
~Partial NACK, explanation follows.~ (Changed after discussing the changes with @kostyantyn and after removing 4.)
@kostyantyn unfortunately points 3 and 4 are introducing serious issues, the current behavior is expected, there is no bug there. A validator must be part of 2 consecutive dynasties before he can vote (and the DepositExists
also follows this). This is called `stitching mechanism' and is part of the casper design on purpose.
Quoting from the casper's paper:
Without the validator set stitching mechanism, it’s possible for two conflicting checkpoints c and c' to both be finalized without any validator getting slashed.
I moreover think that we diverged at some point from the casper's design and without realizing started fixing the previous correct implementation with one that adapts with the diverged. The main point of difference here is that in our implementation an epoch
is justified iff its checkpoint
receives 2/3 of the votes; wether in the original casper's implementation an epoch
is justified iff the checkpoint
of the previous epoch
receives enough votes. This means that in case of optimal finalization when the current epoch contains enough votes to the previous checkpoint it gets justified immediately.
As a result of this at least half of our implementation is off-by-one. Even if we fix everything (like this PR tries to) we'll end up having finalization always one epoch late, and I don't think is a good idea to wait 50 block more than needed.
I think we should re-evaluate the current implementation, assessing the impact on other features (fork-choice, snapshot, etc...), revert part of it (first epoch=1 can remain for example) and align back to the original intent.
Reviewed the paper with @Gnappuraz and we both agreed that we need to check 2/3 of votes on the current and previous dynasty. In the follow-up PR, I'll write the test which shows the possible attack scenario and how the current 2/3 rule fixes it.
The issue was that finalizer didn't vote in its last dynasty_logout_delay
dynasty and caused the stuck of the system. Updated the 4th bug and the description.
About the DepositExists
change.
It doesn't affect this stitching mechanism
and makes the constant interval between when the deposit was created and when the finalizer can vote. With the change, the interval is always 3 dynasties. Without this change, it makes 4 dynasties when we go from 0 deposits to some and 3 dynasties when already had deposits.
Also, if we don't change this function, we initially have two concurrent flows, instant justification + voting justification. The proposed change fixes this.
We use an instant justification approach to mimic finalizers and make the finalizationstate consistent regardless of which system we run now. Casper uses instant finalization which causes an issue once we leave instant finalization as the finalizationstate before is not consistent with one that follows.
Since it's not affecting mainet in any way as we will either keep finalizers forever or possibly go from finalizers to instant justification but not another way around as in regtest, from instant justification to finalizers, I'd recomment to keep this change for better understanding the flow.
utACK 27a345c
had to rebase with master and force-push as we had conflicts with master.
Thanks to @Gnappuraz for noticing irregularities in logout behavior and providing an initial test that reproduced the problem.
This PR makes several fixes for Esperanza protocol, more particularly for logout and withdraw logic as it currently can lead to stuck of finalization.
rpc_finalization.py
test was improved to catch it.GetDynastyDelta
. This function returns the delta that is added tom_cur_dyn_deposits
during dynasty increment. If finalizer logged out,GetDynastyDelta
returns-deposit
value which is added tom_cur_dyn_deposits
to decrease it but since the returned type ofGetDynastyDelta
wasuint64_t
, we never decreased total deposits and it means that we couldn't reach 2/3 to justify the epoch.DepositExists
. It returnedtrue
only if we had deposits in the current dynasty and in the previous one. The issue was that once we increment the dynasty and register deposits, finalizers can start voting and at the same time we still haveInstaJustify
so we keep two systems in parallel. The reason we didn't spot this problem in functional tests is that we thought finalization happened due to votes but actually, it happened due to InstaJustify. NowDepositExists
returnstrue
only if current dynasty has deposits.rpc_finalization.py
is adjusted to spot this problem too.current_votes >= prev_dynasty_threahold
. The fix is to check if we reached 2/3 of votes for the current dynasty only.~ The bug that finalizer doesn't vote at its lastdynasty_logout_delay
dynasty. This bug causes an issue that finalizers can't pass 2/3 of votes on the previous dynasty threshold and it prevents of justifying epochs and eventually the finalization gets stuck.All the above bugs were found by updated
esperanza_withdraw.py
test. This test andrpc_finalization.py
can be a good starting point to understanding changes.Signed-off-by: Kostiantyn Stepaniuk kostia@thirdhash.com