dtr-org / unit-e

A digital currency for a new era of decentralized trust
https://unit-e.io
MIT License
45 stars 15 forks source link

Fix unsynchronized ValidatorState with Validator #937

Closed kostyantyn closed 5 years ago

kostyantyn commented 5 years ago

recap of entities FinalizationState keeps track of all Validators we have in the system and is responsible for validating their FinalizerCommits. ValidatorState keeps the state of the finalizer which is associated with the wallet (if we have any). It is responsible for tracking its votes not to create double/surrounded votes.

bug 1 Currently on the master, ValidatorState has the same fields as Validator:

  1. m_last_esperanza_tx
  2. m_end_dynasty
  3. m_start_dynasty

The issue of duplicating data is to ensure that they are synced which is hard to achieve as once logic changed in one scope, can be missed in another one. And this is precisely what happened, after re-orgs ValidatorState was not synced with Validator. To solve it, this commit removes duplicate data from ValidatorState and retrieves them from Validator once it's needed. We can see that such refactoring didn't break existing functionality as there are no changes in unit/functional tests. Only feature_vote_reorg.py was extended to reproduce the bug.

bug 2 This was discovered while working on fixing bug 1 as re-index was kept failing. The issue was that we update ValidatorState.m_phase in BlockConnected which is invoked on a thread. It means that the state can be a bit "outdated" at the point of quering it (but eventually will catch up). However, when we do re-index, we invoke WalletExtension::AddToWalletIfInvolvingMe before we call BlockConnected and it means that when we expect that validator to be in the right state, it might not be yet there. It's specifically critical when we check IS_VALIDATING. To solve it, we rely on IsFinalizerVoting (introduced in this commit) function everywhere. We already used that when we processed blocks or finalizer commits but in the wallet, we went with the phase check. Eventually, we should remove or reduce these phases and rely on functions of FinalizationState otherwise we introduce another unsynchronization.

explanation of renamings/moving blocks of code in PR Because we now look up to the Validator object often for the missing data, we can't name validator local variable which means ValidatorState as often both objects co-exist and this is the reason validator was renamed to validator_state in many places.

We have the following order of locks: cs_main->cs_wallet->RepositoryImpl.m_cs. As often we need to retrieve Validator, we need to lock RepositoryImpl.m_cs. To preserve the same lock ordering, in some places cs_main->cs_wallet was moved higher in the hierarchy. Mostly affected WalletExtension::SendLogout, WalletExtension::SendWithdraw and their RPC commands.

WalletExtension::BlockConnected used to set IS_VALIDATING state when the deposit was finalized. This is not correct as finalizer still must not vote until current_dynasty==start_dynasty. Fortunately, we had this check later on. But since m_deposit_epoch was removed too (which was used to set IS_VALIDATING), WalletExtension::BlockConnected was re-written to rely on start_dynasty instead and it caused some changes.

Next steps We need to review all ValidatorState.phases https://github.com/dtr-org/unit-e/issues/972

Signed-off-by: Kostiantyn Stepaniuk kostia@thirdhash.com

kostyantyn commented 5 years ago

I merged as it's already running on testnet (via custom build by @Ruteri) and we would like to have it on master. This PR fixes the voting issue (the most frequent tx from finalizer commits) but we still have potential issues with others. Here is the issue https://github.com/dtr-org/unit-e/issues/972 and this is what I'll be working next.