dtr-org / unit-e

A digital currency for a new era of decentralized trust
https://unit-e.io
MIT License
45 stars 15 forks source link

Workaround for the "future vote" problem. #841

Closed dsaveliev closed 5 years ago

dsaveliev commented 5 years ago

Fixes #801

Problem description: Sometimes the node, which needs one block up to the new epoch, has a data race with validator node. The node propagates newly created block (e.g. via cmpctblock) and receives vote transaction from the validator node faster, than it updates internal state (m_current_epoch) - that leads to rejection of the valid vote transaction.

It turns out that this problem already known #652 and waiting for the complete solution. As workaround, I added wait_for_vote_and_disconnect to the failing tests.

Signed-off-by: Dmitry Saveliev dima@thirdhash.com

dsaveliev commented 5 years ago

@kostyantyn @Gnappuraz Correct me if I wrong, but after the fix, suggested in #652 we will be able to get rid of "_and_disconnect" part in the "wait_for_vote_and_disconnect", isn't it?

kostyantyn commented 5 years ago

@dsaveliev after https://github.com/dtr-org/unit-e/issues/652 is resolved we could stop disconnecting but we shouldn't because then votes can be included at any block height and it will make test unpredictable. In case of failure, we won't be able to reproduce exactly the same scenario (or it will be tough), and we should aim to have deterministic tests.

Until we come up with a better approach than disconnecting finalizer from other nodes, I'd suggest keeping as it is.

dsaveliev commented 5 years ago

@kostyantyn yes, I've caught an idea, but #652 is intended to help us to handle such a situation when the vote message comes at any block height, right? I mean - keeping finalizer and node disconnected helps us to keep the tests predictable, but this is a kinda artificial condition.

kostyantyn commented 5 years ago

when we start working on https://github.com/dtr-org/unit-e/issues/652 we will write tests to cover:

  1. node at height X, receive a vote which targets X-1
  2. node at height X, receive a vote which targets X
  3. node at height X, receive a vote which targets X+1
  4. node at height X, receive a vote which targets X+2
  5. node at height X, receive a vote which targets X+N

Then we will simulate all possible scenarios and in case the test fails, we will know exactly which X+N we broke. If we have one generic test that tests different X, X+1, X+2 targets depends on the timing, it will make this test harder to support and investigate.

dsaveliev commented 5 years ago

Yes, I agree with that. All I want to say, from my point of view, this helper function can be simplified once we fix #652. That is the reason why I called current implementation a "workaround".