alexdovzhanyn / elixium_core

A privacy-preserving decentralized application network
https://www.elixiumnetwork.org/
281 stars 37 forks source link

Networking: Blocks received in wrong order #55

Closed alexdovzhanyn closed 5 years ago

alexdovzhanyn commented 6 years ago

When test mining, I was able to process about 900 blocks before encountering something weird: my node received blocks out of order (from one miner).

Not sure what caused this, but its possibly the order in which we parse messages that come in within the same request.

Output on validator node:

22:39:58.867 [info]  Block 944 valid.

22:39:58.884 [info]  Not enough bytes! Waiting for 500 more bytes...
%{
  hash: "0577C545226A8D4BA329E7F203399FD6CF5628FF763BF087286ABD056E8D712D",
  index: 946,
  merkle_root: "C0C2ADADC6F78731DF59FC478E8A61E42BC2DC37970CF3A66D339C4398ABB928",
  nonce: 1,
  previous_hash: "040AA6DF61FAFE3EFD545AA1773AAA0911D85C2AFA4308288D502FCEF8F0094C",
  timestamp: "2018-09-13 02:39:58.874836Z"
}
%{
  hash: "040AA6DF61FAFE3EFD545AA1773AAA0911D85C2AFA4308288D502FCEF8F0094C",
  index: 945,
  merkle_root: "C2BF83FA312A3C9F64B94D674D3E8C849B6B17DB50A7D51E8896F52EC299CBE8",
  nonce: 3,
  previous_hash: "036AA9E4A3363CC622157ED9C9CC9472F49BD6D093D39A75872AD4696A0BCB93",
  timestamp: "2018-09-13 02:39:58.861104Z"
}

22:39:58.884 [info]  Got the 500 bytes! Constructing message

22:39:58.884 [info]  Accepted message from 127.0.0.1

22:39:58.884 [info]  Time 1536806398884

22:39:58.888 [info]  Block 946 invalid!

22:39:58.888 [info]  Block 945 valid.

Output from miner:


22:39:58.874 [info]  Sending data to peer: 127.0.0.1

22:39:58.874 [info]  Time 1536806398874
Index: 946        Hash: 0577C545226A8D4BA329E7F203399FD6CF5628FF763BF087286ABD056E8D712D Nonce: 1          Elapsed (s): 9.5e-5     Hashrate (H/s): 10526

22:39:58.883 [info]  Sending data to peer: 127.0.0.1
alexdovzhanyn commented 6 years ago

Seems to happen semi-randomly; I was able to recreate this after mining ~8000 blocks.

Output from validator node:

%{
  hash: "02B84C029D7EC099ABFD6F8CC5E88AFA99A8811A6CD7E84AB8C5F41E8310B7C0",
  index: 8177,
  merkle_root: "8A29A048696FAE51CF7B6CDC3EC90FD0017DD03B48F206D516BFE8D636CB8919",
  nonce: 9,
  previous_hash: "049A30EA13A12FEABCC2D39C624EDF3AB3298C682FEB64D4478610F8278C78E1",
  timestamp: "2018-09-13 02:53:41.987911Z"
}

22:53:42.003 [info]  Block 8177 valid.
%{
  hash: "0270136E689EE309ED2206F70CAE734BD5EA633A556E1C0FA17E44CD5F3265A4",
  index: 8179,
  merkle_root: "C078724E2B20EFA0877715A3608D168482F9DE8840E204FA2C1F4B91B4B66B98",
  nonce: 12,
  previous_hash: "029606C7001EA7A4421690B5DB56CAEB398C87D0C8B2C371333F8381C11DCE0C",
  timestamp: "2018-09-13 02:53:42.016551Z"
}
%{
  hash: "029606C7001EA7A4421690B5DB56CAEB398C87D0C8B2C371333F8381C11DCE0C",
  index: 8178,
  merkle_root: "E01EA468322A515BC89A51C617F119E906694F835228532A067756B76E71E573",
  nonce: 19,
  previous_hash: "02B84C029D7EC099ABFD6F8CC5E88AFA99A8811A6CD7E84AB8C5F41E8310B7C0",
  timestamp: "2018-09-13 02:53:42.000941Z"
}

22:53:42.026 [info]  Not enough bytes! Waiting for 500 more bytes...

22:53:42.026 [info]  Got the 500 bytes! Constructing message

22:53:42.026 [info]  Accepted message from 127.0.0.1

22:53:42.026 [info]  Time 1536807222026

22:53:42.026 [info]  Block 8179 invalid!

22:53:42.027 [info]  Block 8178 valid.

Output from miner:

22:53:42.000 [info]  Time 1536807222000
Index: 8178       Hash: 029606C7001EA7A4421690B5DB56CAEB398C87D0C8B2C371333F8381C11DCE0C Nonce: 19         Elapsed (s): 2.52e-4    Hashrate (H/s): 75397

22:53:42.016 [info]  Sending data to peer: 127.0.0.1

22:53:42.016 [info]  Time 1536807222016
Index: 8179       Hash: 0270136E689EE309ED2206F70CAE734BD5EA633A556E1C0FA17E44CD5F3265A4 Nonce: 12         Elapsed (s): 2.07e-4    Hashrate (H/s): 57971

22:53:42.026 [info]  Sending data to peer: 127.0.0.1
fantypants commented 6 years ago

The time stamps seem almost identical, look at: 8179: 2018-09-13 02:53:42.016551Z 8178: 2018-09-13 02:53:42.000941Z

How are they being structured for the return?

alexdovzhanyn commented 6 years ago

IIRC, the timestamps are being sent as strings. It likely is a race condition as you suggest, which means its likely less of a block-ordering issue than it is of a block acceptance issue. We should probably store blocks with an index we haven't reached yet (like what's happening here) somewhere, so that we can figure out somewhere along the line whether or not the block is valid once we do reach that index.

On Thu, Sep 27, 2018 at 2:38 PM Matthew notifications@github.com wrote:

The time stamps seem almost identical, look at: 8179: 2018-09-13 02:53:42.016551Z 8178: 2018-09-13 02:53:42.000941Z

How are they being structured for the return?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ElixiumNetwork/elixium_core/issues/55#issuecomment-425199475, or mute the thread https://github.com/notifications/unsubscribe-auth/AI3b4mDIU0YeBePi5pmrc_0kAk5MWVYuks5ufRsLgaJpZM4Wme1D .

fantypants commented 6 years ago

What about either a linked list or DFS search for them? if we had a linked list you wouldn't necessarily need to know the next block, it could be null until valid and then it would be consecutive the entire time i.e: initial: Block 0(known)-Block 1(waiting/null)-Block 2(known) 1: Block 1(Known)-Block 2(known)-Block3(unknown);

Basically using like a 2-3 element linked list

alexdovzhanyn commented 6 years ago

The issue with that is that we can't validate block 2 until we know block 1, since they're directly dependent on one another. Might be easier to just store it in some sort of block pool that we check every time we receive a new block

On Thu, Sep 27, 2018 at 2:51 PM Matthew notifications@github.com wrote:

What about either a linked list or DFS search for them? if we had a linked list you wouldn't necessarily need to know the next block, it could be null until valid and then it would be consecutive the entire time i.e: initial: Block 0(known)-Block 1(waiting/null)-Block 2(known) 1: Block 1(Known)-Block 2(known)-Block3(unknown);

Basically using like a 2-3 element linked list

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ElixiumNetwork/elixium_core/issues/55#issuecomment-425203384, or mute the thread https://github.com/notifications/unsubscribe-auth/AI3b4i3msGbKeary4S0P1rjaxK4tzJNDks5ufR4hgaJpZM4Wme1D .

fantypants commented 6 years ago

ah true, for the block pool we could validate in pairs instead of singly for the returns, so if we get index 2 before index 1 store in table and move on as soon as index 1 comes in check table, compare and validate blocks.

If we get block 2 before block 1 we cant check if either is essentially valid until we can check the hash from block 1 against block 0, so it may work by having the system condition check to pair validation on those special edge cases.

alexdovzhanyn commented 6 years ago

Yeah thats perfect. This sounds really similar to what I'm doing with forking now, so (and these changes are probably going to have to happen on the node and miner repos as opposed to core), so we can probably bundle this stuff together. I'm using ETS tables for pools since they're only relevant in-memory. I'll push up the rest of the fork code tonight.

On Thu, Sep 27, 2018 at 3:00 PM Matthew notifications@github.com wrote:

ah true, for the block pool we could validate in pairs instead of singly for the returns, so if we get index 2 before index 1 store in table and move on as soon as index 1 comes in check table, compare and validate blocks.

If we get block 2 before block 1 we cant check if either is essentially valid until we can check the hash from block 1 against block 0, so it may work by having the system condition check to pair validation on those special edge cases.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/ElixiumNetwork/elixium_core/issues/55#issuecomment-425206270, or mute the thread https://github.com/notifications/unsubscribe-auth/AI3b4poKH8BztG00jh0_6sBLgFSJJcj2ks5ufSBNgaJpZM4Wme1D .

fantypants commented 6 years ago

Awesome, I'll have a look when it's pushed and write start writing some tests

fantypants commented 5 years ago

I'm assuming we have solved this issue haha