Unrecoverable fork - cause 3 - Lisk 0.2 testnet

ghost commented 8 years ago

log 2016-05-02 17:25:00 GET /api/dapps/categories from 192.168.0.14 
log 2016-05-02 17:25:00 GET /api/transactions/unconfirmed?address=2661596165277957855L&senderPublicKey=5d4443db3b737f34f9ccff5b0a13e1970685fd59c0318a08b9004361c33d7002 from 192.168.0.14 
log 2016-05-02 17:25:00 GET /api/multisignatures/pending?publicKey=5d4443db3b737f34f9ccff5b0a13e1970685fd59c0318a08b9004361c33d7002 from 192.168.0.14 
log 2016-05-02 17:25:03 GET /api/blocks/?limit=25&offset=0&orderBy=height:desc from 192.168.0.14 
log 2016-05-02 17:25:03 GET /api/blocks/getHeight from 192.168.0.14 
log 2016-05-02 17:25:05 GET /api/blocks/?limit=25&offset=0&orderBy=height:desc from 192.168.0.14 
log 2016-05-02 17:25:05 GET /api/blocks/getHeight from 192.168.0.14 
info 2016-05-02 17:25:07 Checking blockchain on 104.251.218.246:7000 
info 2016-05-02 17:25:07 Looking for common block with 104.251.218.246:7000 
info 2016-05-02 17:25:08 Found common block 9265064167066369150 (at 9294) with peer 104.251.218.246:7000 
log 2016-05-02 17:25:11 GET /api/loader/status/sync from 192.168.0.14 
info 2016-05-02 17:25:11 Loading 1439 blocks from 104.251.218.246:7000
info 2016-05-02 17:25:11 Fork { delegate: '2fa63cb23a2e5b8a893dd724db42fd8d6346b9c4198d653c777b030aa1691483',
  block: 
   { id: '5670733510738223641',
     timestamp: 33513520,
     height: 9295,
     previousBlock: '9265064167066369150' },
  cause: 3 }
warn 2016-05-02 17:25:11 Block 5670733510738223641 is not valid, ban 60 min 104.251.218.246:7000
warn 2016-05-02 17:25:11 Can't verify slot: 5670733510738223641 
warn 2016-05-02 17:25:11 Failed to load blocks, ban 60 min 104.251.218.246:7000
log 2016-05-02 17:25:16 GET /api/loader/status/sync from 92.93.65.166 
info 2016-05-02 17:25:20 Checking blockchain on 164.132.201.52:7000 
info 2016-05-02 17:25:20 Looking for common block with 164.132.201.52:7000 
info 2016-05-02 17:25:20 Found common block 9265064167066369150 (at 9294) with peer 164.132.201.52:7000 
info 2016-05-02 17:25:22 Loading 1439 blocks from 164.132.201.52:7000
info 2016-05-02 17:25:22 Fork { delegate: '2fa63cb23a2e5b8a893dd724db42fd8d6346b9c4198d653c777b030aa1691483',
  block: 
   { id: '5670733510738223641',
     timestamp: 33513520,
     height: 9295,
     previousBlock: '9265064167066369150' },
  cause: 3 }
warn 2016-05-02 17:25:22 Block 5670733510738223641 is not valid, ban 60 min 164.132.201.52:7000
warn 2016-05-02 17:25:22 Can't verify slot: 5670733510738223641 
warn 2016-05-02 17:25:22 Failed to load blocks, ban 60 min 164.132.201.52:7000
log 2016-05-02 17:25:26 GET /api/loader/status/sync from 192.168.0.14 
log 2016-05-02 17:25:29 GET /api/loader/status/sync from 94.254.243.151 
log 2016-05-02 17:25:30 POST /peer/blocks from 151.80.24.44 
info 2016-05-02 17:25:31 Checking blockchain on 45.32.157.218:7000 
info 2016-05-02 17:25:31 Looking for common block with 45.32.157.218

It has been fixed with bash lisk.sh rebuild but lisk couldn't automatically fix db. I guess issue happened during connectivity problems, i have reported similar issue with previous database engine.

karmacoma commented 8 years ago

@karek314: I will give my explanation on why this fork occurs.

This type of fork is caused by a mismatch between the expected public key, and the received block's generator public key, when processing a block.

The following gives an example of the error message which will be recorded when a node enters this type of fork:

Note: In 0.2.1, we improved the logging in this case to give more detailed information, and within this explanation I have added comments to the code.

Error message

{
  "level":"error",
  "message":"Expected generator: 7ac9d4b708fb19eaa200eb883be56601ddceed96290a3a033114750b7fda9d0b Received generator: fa7bfd3a2dc0ca55b700247aae4694709d6cdfa34c6bfb0237e032d7aae404f0",
  "timestamp":"2016-05-07 03:40:44"
}

Fork log

{
  "level":"info",
  "message": "Fork",
  "timestamp":"2016-05-07 03:40:44",
  "data": {
    "delegate":"fa7bfd3a2dc0ca55b700247aae4694709d6cdfa34c6bfb0237e032d7aae404f0",
    "block":{
      "id":"13609465205005996164",
      "timestamp":34054830,
      "height":61118,
      "previousBlock":
      "1280968842307106697"
    },
    "cause":3
}

In this example the node has received the following public key: fa7bfd3a2dc0ca55b700247aae4694709d6cdfa34c6bfb0237e032d7aae404f0

The expected public key is determined, like so:

1. Lisk gets a list of all 101 delegates sorted by vote

private.getKeysSortByVote = function (cb) {
  modules.accounts.getAccounts({
    isDelegate: 1,
    sort: { "vote": -1, "publicKey": 1 },
    limit: slots.delegates
  }, ["publicKey"], function (err, rows) {
    if (err) {
      cb(err)
    }

    cb(null, rows.map(function (el) {
      return el.publicKey
    }))
  });
}

2. Lisk determines an order of delegates based on an initial seed

The initial seed is a sha256 hash of the current round, derived from block height.
Delegates are indexed according to currentSeed[x] % 101, with x being an individual byte of said sha256 hash.
For every 4th byte of the current seed, a new hash is created based on the previous one.

Delegates.prototype.generateDelegateList = function (height, cb) {
  private.getKeysSortByVote(function (err, truncDelegateList) {
    if (err) {
      return cb(err);
    }

    var seedSource = modules.round.calc(height).toString(); // Establish current round as the seed source
    var currentSeed = crypto.createHash('sha256').update(seedSource, 'utf8').digest(); // Create first sha256 hash seed from current round

    // Iterate of 101 delegates list sorted by vote
    for (var i = 0, delCount = truncDelegateList.length; i < delCount; i++) {
      // Iterate over each byte in current hash (up to 4th byte)
      for (var x = 0; x < 4 && i < delCount; i++, x++) {
        var newIndex = currentSeed[x] % delCount; // Determine new index
        var b = truncDelegateList[newIndex]; // Get current delegate at new index

        // Swap delegate stack positions
        truncDelegateList[newIndex] = truncDelegateList[i];
        truncDelegateList[i] = b;
      }

      // Generate a new hash seed based on current seed
      currentSeed = crypto.createHash('sha256').update(currentSeed).digest();
    }

    cb(null, truncDelegateList);
  });
}

3. Lisk determines a slot number according to the block timestamp, and selects the expected public key from the previously generated delegates list

Delegates.prototype.validateBlockSlot = function (block, cb) {
  // Generate delegate list based on current block height (round)
  self.generateDelegateList(block.height, function (err, activeDelegates) {
    if (err) {
      return cb(err);
    }

    // Get slot number according to given block timestamp
    var currentSlot = slots.getSlotNumber(block.timestamp); // block.timestamp / 10 block time = 3405483
    var delegate_id = activeDelegates[currentSlot % slots.delegates]; // delegate_id = 3405483 % 101 delegates = 66

    // Expected: 7ac9d4b708fb19eaa200eb883be56601ddceed96290a3a033114750b7fda9d0b (66)
    // Received: fa7bfd3a2dc0ca55b700247aae4694709d6cdfa34c6bfb0237e032d7aae404f0

    // Compare public keys
    if (delegate_id && block.generatorPublicKey == delegate_id) {
      return cb(); // Keys match
    } else {
      // Keys do not match
      library.logger.error("Expected generator: " + delegate_id + " Received generator: " + block.generatorPublicKey);
      return cb("Failed to verify slot: " + currentSlot);
    }
  });
}

If the received and expected public keys do not match. The node enters a fork, named cause 3.

Possible Causes

The block signer's system clock was adrift, producing an invalid block timestamp, and therefore a different delegate slot number is used to that which is later determined when validating the delegate slot.
The initial delegates list used to validate a delegate slot is sorted by vote. If the votes are recorded differently on an individual node (possibly as a result of missed blocks), this will affect the order of the delegate list used to select delegates according to the given slot number.

I hope this is a full enough explanation on why this fork is occurring. I am currently working on a solution, but posting this explanation to open the discussion and improve understanding.

karmacoma commented 8 years ago

Upon checking the database snapshot which everyone is seeding their nodes from, I've found a problem:

Example, as of writing the current round is:

lisk_test=# select max(round) from mem_round;
 max 
-----
 694
(1 row)

Rounds are normally applied and "flushed" when they end. However, this is the current state of the db:

lisk_test=# select round, count(*) as actions from mem_round group by round having round < 694;
-[ RECORD 1 ]
round | 137
actions  | 192
-[ RECORD 2 ]
round | 4
actions   | 1136
-[ RECORD 3 ]
round | 679
actions   | 14499

Therefore, that's a total of 3 rounds which have not been applied properly. As a consequence this affects delegate votes, which in turn affects the delegate listing sorted by vote mentioned in this issue, and causing the amount of forks we've been experiencing.

Taking one block which occurred in round: 4 as an example. In block: https://explorer.lisk.io/block/15972574090333109954 at height: 376, there are 25 transactions, all of which are still present in mem_round as 373 cascading actions. The block was generated and signed correctly, but it's actions are not consistently applied to all nodes, especially so for any which used the snapshot to rebuild from.

Conclusions:

The snapshot being used is corrupt and merely taking a snapshot is not safe, due to the nature nodes are "quick loaded" without verifying the whole chain from the beginning.
Centralised snapshots can be bad for network security, especially in our case where there are only 101 delegates.
We need to add the required checks on startup to look for these "unapplied" rounds, then throw an error, and rollback to a safe point to build from.

Next steps: I will verify all blocks in the current snapshot, replace the current one at downloads.lisk.io, and then later implement the required checking for the next release. Any snapshots I take in the meantime I will manually check for "unapplied" rounds.

fix commented 8 years ago

Interestingly there were many forks were active delegated were changing (like yesterday). Provided that the numerous forks were due to this issue (which we don't know for sure), that would explain why many nodes are miscalculating the delegate votes.

As far as I understood, the mem_* tables are shortcut to simplify and accelerate blockchain computation. I guess we should expect, with time, regular desynchronisation between nodes on these tables (for instance a SIG_KILL occurring).

If a fork 3 happens (or actually whatever error check on mem_* tables), a recomputation of these tables from scratch could be a good strategy.

2mdoty commented 8 years ago

This would explain why Crypti would be unstable for a while after a major change in voting from large transactions, shuffling delegate ranks, though it eventually resolves itself. The time syncing has also been a chronic issue.

fix commented 8 years ago

other idea, remove the mem_round table and perform the computation in memory for the current round.

Isabello commented 8 years ago

I believe we found definitive proof of fork cause 3 occuring because of delegates shifting.

Here is an error message from earlier:

error 2016-05-09 08:07:33 Expected generator: e63756373d8bab04d6234851f4fc643ba77d7e8462aa14e89aa864ed0a9d3a4f Received generator: 6beaa7c569c1000f4fcef4ce3133b18609aea52adf95d5992970ea5e0cedda87
info 2016-05-09 08:07:33 Fork { delegate: '6beaa7c569c1000f4fcef4ce3133b18609aea52adf95d5992970ea5e0cedda87',
block:
{ id: '10723986915118145364',
timestamp: 34243170,
height: 75386,
previousBlock: '4610472515399194013' },
cause: 3 }

Below we can see two nodes, one healthy, the other not, and on the right we can see the two delegates public keys which can be compared to the two wallets. Its clearly showing that the delegates swapped places and if we see the public keys from above, its the same two delegates mentioned.

I'm not quite sure what can be done to alieviate this problem but one solution that I am proposing is the following:

At the beginning of the round, delegate votes are fixed in place. These votes will be collected among all nodes until the end of the round. On the final block of the round these votes would be tallied and signed onto that block. It may be helpful for the next block afterwards to pause for longer than normal block time (if thats possible) to allow this final block to propagate through the network. In this way, the shifting issue is resolved and every node is allowed a brief period to get caught up and fully in sync to generate the newest delegate list for the next block.

fix commented 8 years ago

I agree, the nodes don't have the same information about the votes, above all if votes occurs in the last blocks, at the end of each round. So they generate an errorneus list of active delegates if the order change at the end.

Either we prevent this fork to occur, either we make it recoverable.

I expect the solution of @Isabello being heavy to implement because we include a new type of block in the blockchain. However waiting 60s to start the next round (ie there will be 60s between 2 cross-round blocks) could be enough to start the mainnet and prevent "enough" fork cause 3. We can work then on a better solution later.

mrv777 commented 8 years ago

I'll admit to not understanding every aspect of the code, but the solution mentioned sounds good to me. A new temporary table in the DB that is a static snapshot of the delegate list at the beginning of the round and gets replaced at the end of each round. All the other code and structure can remain the same. Just for forging, the temporary/static table of delegates would be used instead of the current data.

Was just talking to rabid about it. I could be missing something, but how much overhead can there be with one more table with 101 rows and 3 columns. rank, publicKey, and forging slot (since this is randomized)

Edit: GreXX's suggestion of just another column in the current table that contains the previous round's rank on that round's last block, seems more efficient then a whole new table.

Edit 2: Looking at the code. I wonder if it would be easier to wait to apply votes until the last or first block in a round instead of recording the previous round's ranks

2mdoty commented 8 years ago

I'm thinking there should be 2 blockchains. One is the complete history, transactions of a complete forging round, voting, and delegate rank in a block with a block cycle of once every forging round, and another with the blocks forged by each delegate, going back a number of forging rounds for redundancy.

The permanent blockchain is verified in background by the delegates as a separate task from forging to the short term temporay chain.

Voting and change of delegate ranking seems to cause network instability, and snapshots are coming out invalid. having a permanent blockchain with long block time allows better validation and prevents partial, truncated invalid data from corrupting main chain. working short term chain going back several cycles for forging gives some redundancy and can be processed faster.

jfeldis commented 8 years ago

namecoin and blockstack both use a 2 step process for registering names on a blockchain. The first step is a "pre-order" that records the users request on the blockchain but does not officially register it. After 6 blocks (btc/nmc), the user must send a registration transaction to complete. If the pre-order is not followed by the registration within Y blocks, the pre-order expires and is discarded. Similarly, a registration without a prior pre-order is ignored as invalid.

Perhaps a similar system for casting votes would help?

Step 1: "record vote" transaction
Minimum delay of X blocks, exipires after Y blocks
Step 2: "cast vote" transaction

Both the "record" and "cast" transactions must be present within the correct time spacing and sequence to be valid

fix commented 8 years ago

There is something similar binded in lisk, but that you don't see:

when you register a delegate it records it on the blockchain in "unconfirmed" way (u_username). https://github.com/LiskHQ/lisk/blob/36abd953099bfb016fecf440d8865efedbe55970/modules/delegates.js#L147
if there is a collision (a fork) the node will try to resolve it and while poping a forked block, it will undo the delegate registration.
if there is a confirmation with more blocks getting confirmed, it will definitily apply the u_username to username

What can happen very unlikely is that 2 usernames get registered in different area of the network. After a few blocks, while the user got a success registering his name, it will disappear (and his fees retruned) because there were no consensus.

fix commented 8 years ago

We had a deep discussion recently. What came out:

the wrong snapshot messed up everything in calculating delegates in the round. That is likely the main cause
provided that the snapshot is clean, it is hard to believe that the nodes are not in sync calculating the next delegate round, since anyway they do it when they all have the previous round blocks. But we may be wrong
the remaining cause is likely the timestamp adrift between forging nodes. If we have 5s drift, nodes are likely entering in fork. The solution we came up with is to had the block number (from 0 to 100) in the round so the function slots.getSlotNumber(block.timestamp) would be slots.getSlotNumber(block.roundrank); and gets much more reliable.

jfeldis commented 8 years ago

Do nodes do any type of P2P timestamp comparison with each other like in bitcoin?https://en.bitcoin.it/wiki/Block_timestamp

ByronAP commented 8 years ago

why not drop back a few blocks instead of relying on the current block

qtipwax commented 8 years ago

jfeldis wrote:

namecoin and blockstack both use a 2 step process for registering names on a

blockchain...

I could see this being used with multisig also. A reject should be an option for multisig, not only approval in my opinion. Message added for reason might be nice also.

ghost commented 8 years ago

As issue has been closed couple times. I assume that some of this fixes has been applied in 0.2.3. But issue still occurs. Im aware it's in progress now, but maybe it will be helpful.

log 2016-05-22 14:07:39 GET /api/multisignatures/pending?publicKey=d912ed2548ad2428d7aa274ed5af3c4feab73ac0cc69502163f8348b5ec454dd from 10.0.0.2 
info 2016-05-22 14:07:40 Loading 1439 blocks from 99.198.173.42:7000
error 2016-05-22 14:07:40 Expected generator: 80455bf1ac7dc974452b1880f4fe4f4af4c74d26ac060586d1ca42b63b82cb70 Received generator: 2e4d19a7eb30bc047a29d661c884672b5a4eaee1c212ef5469d5b4cad0cb6d64 
info 2016-05-22 14:07:40 Fork { delegate: '2e4d19a7eb30bc047a29d661c884672b5a4eaee1c212ef5469d5b4cad0cb6d64',
  block: 
   { id: '12418910332365484318',
     timestamp: 35369160,
     height: 27076,
     previousBlock: '5350555175686575766' },
  cause: 3 }
warn 2016-05-22 14:07:40 Block 12418910332365484318 is not valid, ban 60 min 99.198.173.42:7000
warn 2016-05-22 14:07:40 Failed to verify slot: 3536916 
warn 2016-05-22 14:07:40 Failed to load blocks, ban 60 min 99.198.173.42:7000
log 2016-05-22 14:07:41 POST /peer/blocks from 108.61.123.223 
log 2016-05-22 14:07:41 POST /peer/blocks from 51.255.87.64

ghost commented 8 years ago

as far i heard it's fixed, i have just reproduced with 0.3.0

error 2016-05-25 08:14:09 Expected generator: 366cfff7cc7f96e16c512b9df04f5bc05b694cb253a46e52f1ac8d79be223b06 Received generator: 326544de0b5ae1d73c5b7c233f873de57b02fbec890c90847aa46e621a164b3d 
info 2016-05-25 08:14:09 Fork { delegate: '326544de0b5ae1d73c5b7c233f873de57b02fbec890c90847aa46e621a164b3d',
  block: 
   { id: '15487781969400448893',
     timestamp: 45810,
     height: 3243,
     previousBlock: '14749820397961569831' },
  cause: 3 }

fix commented 8 years ago

Here i have reproduced twice, because of low internet connectivity. Sounds that lagging could trigger fork 3 now.

ghost commented 8 years ago

Yes i agree, i couldn't reproduce it on machine with stable connectivity. but one with poor internet connection enters forks very often. How about implement mechanism to remove last block and it's data when node can't move on with blockchain ?

fix commented 8 years ago

It is already done but this case is much more complicated and involve a wrong database state most of the case On May 26, 2016 14:15, "karek314" notifications@github.com wrote:

Yes i agree, i couldn't reproduce it on machine with stable connectivity. but one with poor internet connection enters forks very often. How about implement mechanism to remove last block and it's data when node can't move on with blockchain ?

— You are receiving this because you commented. Reply to this email directly or view it on GitHub https://github.com/LiskHQ/lisk/issues/82#issuecomment-221854487

fix commented 8 years ago

@karmacoma @MaxKK can we close this bug? there is some internal documentation to improve network reliability of the network preventing such forks. Maybe we could open up some discussion somewhere is the wiki instead, because it is difficult to keep track of this kind of issues in github/issues

karmacoma commented 8 years ago

@MaxKK Agree, we will close here and focus on improving network reliability on a case by case basis. For example: https://github.com/LiskHQ/lisk/issues/213. We should also improve our fork documentation here: https://github.com/LiskHQ/lisk-docs/blob/development/Troubleshooting.md or a wiki.

lichuan commented 6 years ago

As far as I know, DPOS consensus is not BFT, so the solution is improving the consensus layer. In bad network condition, the DPOS consensus will cause many nodes see different new block (same height, different hash) be producing by different delegate node. Sometimes, bad network condition can even cause the new block producing by delegate can not be confirmed, because the last irreversible block (LIB) need (n-1)*2/3 confirmations by witness node, so in this condition the block will be suspend forever. So, the DPOS is not safe and is not BFT, lisk and bitshares and steemit have the same problem.

lichuan commented 6 years ago

@MaxKK @karmacoma @fix

LiskArchive / lisk-sdk