dtr-org / unit-e

A digital currency for a new era of decentralized trust
https://unit-e.io
MIT License
45 stars 15 forks source link

Fixes in esperanza protocol for logout and withdraw txs #871

Closed kostyantyn closed 5 years ago

kostyantyn commented 5 years ago

Thanks to @Gnappuraz for noticing irregularities in logout behavior and providing an initial test that reproduced the problem.

This PR makes several fixes for Esperanza protocol, more particularly for logout and withdraw logic as it currently can lead to stuck of finalization.

  1. Retrospective bug of epoch=1 PR https://github.com/dtr-org/unit-e/pull/809. We shouldn't increment the dynasty for finalized hardcoded epoch=0 as it's already considered incremented from -1. In the ideal scenario, when we always reach finalization, dynasty number catches up to finalization but not another way around. Otherwise, it will lead to stuck of finalization. rpc_finalization.py test was improved to catch it.
  2. int64_t-> uint64_t bug in GetDynastyDelta. This function returns the delta that is added to m_cur_dyn_deposits during dynasty increment. If finalizer logged out, GetDynastyDelta returns -deposit value which is added to m_cur_dyn_deposits to decrease it but since the returned type of GetDynastyDelta was uint64_t, we never decreased total deposits and it means that we couldn't reach 2/3 to justify the epoch.
  3. Bug in DepositExists. It returned true only if we had deposits in the current dynasty and in the previous one. The issue was that once we increment the dynasty and register deposits, finalizers can start voting and at the same time we still have InstaJustify so we keep two systems in parallel. The reason we didn't spot this problem in functional tests is that we thought finalization happened due to votes but actually, it happened due to InstaJustify. Now DepositExists returns true only if current dynasty has deposits. rpc_finalization.py is adjusted to spot this problem too.
  4. ~Bug in how 2/3 of votes are calculated. We checked if we reached 2/3 by comparing current votes with the threshold in the current dynasty and in the previous one. The issue is if someone logged out, you'll have a point where threshold in dynasty X is smaller then in dynasty X-1 and it means that if 1/3 of deposits are withdrawn, you can't pass the check current_votes >= prev_dynasty_threahold. The fix is to check if we reached 2/3 of votes for the current dynasty only.~ The bug that finalizer doesn't vote at its last dynasty_logout_delay dynasty. This bug causes an issue that finalizers can't pass 2/3 of votes on the previous dynasty threshold and it prevents of justifying epochs and eventually the finalization gets stuck.

All the above bugs were found by updated esperanza_withdraw.py test. This test and rpc_finalization.py can be a good starting point to understanding changes.

Signed-off-by: Kostiantyn Stepaniuk kostia@thirdhash.com

Gnappuraz commented 5 years ago

~Partial NACK, explanation follows.~ (Changed after discussing the changes with @kostyantyn and after removing 4.)

@kostyantyn unfortunately points 3 and 4 are introducing serious issues, the current behavior is expected, there is no bug there. A validator must be part of 2 consecutive dynasties before he can vote (and the DepositExists also follows this). This is called `stitching mechanism' and is part of the casper design on purpose. Quoting from the casper's paper:

Without the validator set stitching mechanism, it’s possible for two conflicting checkpoints c and c' to both be finalized without any validator getting slashed.

I moreover think that we diverged at some point from the casper's design and without realizing started fixing the previous correct implementation with one that adapts with the diverged. The main point of difference here is that in our implementation an epoch is justified iff its checkpoint receives 2/3 of the votes; wether in the original casper's implementation an epoch is justified iff the checkpoint of the previous epoch receives enough votes. This means that in case of optimal finalization when the current epoch contains enough votes to the previous checkpoint it gets justified immediately.

As a result of this at least half of our implementation is off-by-one. Even if we fix everything (like this PR tries to) we'll end up having finalization always one epoch late, and I don't think is a good idea to wait 50 block more than needed.

I think we should re-evaluate the current implementation, assessing the impact on other features (fork-choice, snapshot, etc...), revert part of it (first epoch=1 can remain for example) and align back to the original intent.

kostyantyn commented 5 years ago

Reviewed the paper with @Gnappuraz and we both agreed that we need to check 2/3 of votes on the current and previous dynasty. In the follow-up PR, I'll write the test which shows the possible attack scenario and how the current 2/3 rule fixes it.

The issue was that finalizer didn't vote in its last dynasty_logout_delay dynasty and caused the stuck of the system. Updated the 4th bug and the description.

About the DepositExists change. It doesn't affect this stitching mechanism and makes the constant interval between when the deposit was created and when the finalizer can vote. With the change, the interval is always 3 dynasties. Without this change, it makes 4 dynasties when we go from 0 deposits to some and 3 dynasties when already had deposits. Also, if we don't change this function, we initially have two concurrent flows, instant justification + voting justification. The proposed change fixes this. We use an instant justification approach to mimic finalizers and make the finalizationstate consistent regardless of which system we run now. Casper uses instant finalization which causes an issue once we leave instant finalization as the finalizationstate before is not consistent with one that follows. Since it's not affecting mainet in any way as we will either keep finalizers forever or possibly go from finalizers to instant justification but not another way around as in regtest, from instant justification to finalizers, I'd recomment to keep this change for better understanding the flow.

Gnappuraz commented 5 years ago

utACK 27a345c

kostyantyn commented 5 years ago

had to rebase with master and force-push as we had conflicts with master.