HydraChain / hydrachain

Permissioned Distributed Ledger based on Ethereum
MIT License
358 stars 105 forks source link

runmultiple: occasional startup discovery problems #73

Open konradkonrad opened 8 years ago

konradkonrad commented 8 years ago

Every once in a blue moon, runmultiple doesn't manage to connect the whole network. The log shows excessive ready chatter:

$ hydrachain -d $(mktemp -d) -l :info,hdc.chainservice:DEBUG,eth.chainservice:DEBUG,contracts:debug,eth.chain:debug -c p2p.listen_host="127.0.0.1" -c discovery.listen_host="127.0.0.1" -c jsonrpc.corsdomain='http://localhost:8080' --log-file=hydrachain.log runmultiple

INFO:hdc.chainservice   initializing chain 
INFO:eth.chain  Initializing new chain 
INFO:eth.chain  new genesis difficulty=131072 genesis_hash=<Block(#0 0cd786a2)>
DEBUG:eth.chain updating head 
DEBUG:eth.chain set new head head=<CachedBlock(#0 0cd786a2)>
DEBUG:eth.chain updating head candidate head=<CachedBlock(#0 0cd786a2)>
DEBUG:eth.chain chain @ head_hash=<CachedBlock(#0 0cd786a2)>
DEBUG:eth.chain got genesis nonce=00006d6f7264656e difficulty=131072
DEBUG:eth.chain updating head candidate head=<CachedBlock(#0 0cd786a2)>
INFO:hdc.chainservice   chain at number=0
INFO:app    registering service service=chain
INFO:app    registering service service=jsonrpc
INFO:app    starting 
DEBUG:hdc.chainservice  setting up alarm 
INFO:p2p.peermgr    starting peermanager 
INFO:p2p.peermgr    starting listener addr=('127.0.0.1', 29872)
INFO:p2p.discovery  starting discovery 
INFO:p2p.discovery  starting listener host=127.0.0.1 port=29872
INFO:jsonrpc    starting JSONRPCServer port=4002
DEBUG:hdc.chainservice  broadcasting origin=None obj=<Ready(n:0)>
INFO:p2p.peermgr    waiting for bootstrap 
DEBUG:hdc.chainservice  broadcasting origin=None obj=<Ready(n:1)>
DEBUG:hdc.chainservice  broadcasting origin=None obj=<Ready(n:1)>
DEBUG:hdc.chainservice  broadcasting origin=None obj=<Ready(n:1)>
DEBUG:hdc.chainservice  broadcasting origin=None obj=<Ready(n:2)>
INFO:p2p.peer   received hello client_version=NODE1 version=55 capabilities=(('hdc', 1), ('p2p', 4))
DEBUG:hdc.chainservice  ---------------------------------- 
DEBUG:hdc.chainservice  on_wire_protocol_start proto=<HDCProtocol <Peer('127.0.0.1', 29871) NODE1>>
INFO:p2p.peer   received hello client_version=NODE0 version=55 capabilities=(('hdc', 1), ('p2p', 4))
DEBUG:hdc.chainservice  ---------------------------------- 
DEBUG:hdc.chainservice  on_wire_protocol_start proto=<HDCProtocol <Peer('127.0.0.1', 45021) NODE0>>
DEBUG:hdc.chainservice  ---------------------------------- 
DEBUG:hdc.chainservice  status received eth_version=1 proto=<HDCProtocol <Peer('127.0.0.1', 29871) NODE1>>
DEBUG:eth.chain get_transactions called on=<Block(#1 bb18ab76)>
DEBUG:hdc.chainservice  ---------------------------------- 
DEBUG:hdc.chainservice  status received eth_version=1 proto=<HDCProtocol <Peer('127.0.0.1', 45021) NODE0>>
DEBUG:eth.chain get_transactions called on=<Block(#1 63909d6d)>
DEBUG:hdc.chainservice  broadcasting origin=None obj=<Ready(n:2)>
DEBUG:hdc.chainservice  ---------------------------------- 
DEBUG:hdc.chainservice  recv ready ready=<Ready(n:2)> remote_id=<HDCProtocol <Peer('127.0.0.1', 29871) NODE1>>
DEBUG:hdc.chainservice  broadcasting origin=None obj=<Ready(n:3)>
DEBUG:hdc.chainservice  broadcasting origin=<HDCProtocol <Peer('127.0.0.1', 29871) NODE1>> obj=<Ready(n:2)>
DEBUG:hdc.chainservice  ---------------------------------- 
DEBUG:hdc.chainservice  recv ready ready=<Ready(n:3)> remote_id=<HDCProtocol <Peer('127.0.0.1', 45021) NODE0>>
DEBUG:hdc.chainservice  broadcasting origin=None obj=<Ready(n:3)>
DEBUG:hdc.chainservice  broadcasting origin=<HDCProtocol <Peer('127.0.0.1', 45021) NODE0>> obj=<Ready(n:3)>
...
DEBUG:hdc.chainservice  recv ready ready=<Ready(n:8)> remote_id=<HDCProtocol <Peer('127.0.0.1', 45021) NODE0>>
DEBUG:hdc.chainservice  broadcasting origin=None obj=<Ready(n:9)>
DEBUG:hdc.chainservice  broadcasting origin=<HDCProtocol <Peer('127.0.0.1', 45021) NODE0>> obj=<Ready(n:8)>
DEBUG:hdc.chainservice  broadcasting origin=None obj=<Ready(n:3)>
DEBUG:hdc.chainservice  ---------------------------------- 
...
DEBUG:hdc.chainservice  ---------------------------------- 
DEBUG:hdc.chainservice  recv ready ready=<Ready(n:130)> remote_id=<HDCProtocol <Peer('127.0.0.1', 45021) NODE0>>
DEBUG:hdc.chainservice  broadcasting origin=None obj=<Ready(n:144)>
DEBUG:hdc.chainservice  broadcasting origin=<HDCProtocol <Peer('127.0.0.1', 45021) NODE0>> obj=<Ready(n:130)>
DEBUG:hdc.chainservice  ---------------------------------- 
DEBUG:hdc.chainservice  recv ready ready=<Ready(n:131)> remote_id=<HDCProtocol <Peer('127.0.0.1', 45021) NODE0>>
DEBUG:hdc.chainservice  broadcasting origin=None obj=<Ready(n:145)>
DEBUG:hdc.chainservice  broadcasting origin=<HDCProtocol <Peer('127.0.0.1', 45021) NODE0>> obj=<Ready(n:131)>
DEBUG:hdc.chainservice  ---------------------------------- 
DEBUG:hdc.chainservice  recv ready ready=<Ready(n:132)> remote_id=<HDCProtocol <Peer('127.0.0.1', 45021) NODE0>>
DEBUG:hdc.chainservice  broadcasting origin=None obj=<Ready(n:146)>
DEBUG:hdc.chainservice  broadcasting origin=<HDCProtocol <Peer('127.0.0.1', 45021) NODE0>> obj=<Ready(n:132)>
DEBUG:hdc.chainservice  ---------------------------------- 

One thing to note is, that NODE2 doesn't get ready:

$ cat hydrachain.log|grep NODE2
2016-06-02 08:25:03,929 INFO:p2p.peer   received hello client_version=NODE2 version=55 capabilities=(('hdc', 1), ('p2p', 4))
2016-06-02 08:25:03,929 DEBUG:hdc.chainservice  on_wire_protocol_start proto=<HDCProtocol <Peer('127.0.0.1', 29872) NODE2>>
2016-06-02 08:25:04,336 INFO:p2p.peer   received hello client_version=NODE2 version=55 capabilities=(('hdc', 1), ('p2p', 4))
2016-06-02 08:25:04,337 DEBUG:hdc.chainservice  on_wire_protocol_start proto=<HDCProtocol <Peer('127.0.0.1', 29872) NODE2>>
2016-06-02 08:25:04,723 DEBUG:hdc.chainservice  on_wire_protocol_stop proto=<HDCProtocol <Peer('not ready',) NODE2>>
2016-06-02 08:25:05,238 DEBUG:hdc.chainservice  on_wire_protocol_stop proto=<HDCProtocol <Peer('not ready',) NODE2>>
2016-06-02 08:25:10,928 INFO:p2p.peer   received hello client_version=NODE2 version=55 capabilities=(('hdc', 1), ('p2p', 4))
2016-06-02 08:25:10,929 DEBUG:hdc.chainservice  on_wire_protocol_start proto=<HDCProtocol <Peer('127.0.0.1', 29872) NODE2>>
2016-06-02 08:25:12,002 INFO:p2p.peer   received hello client_version=NODE2 version=55 capabilities=(('hdc', 1), ('p2p', 4))
2016-06-02 08:25:12,002 DEBUG:hdc.chainservice  on_wire_protocol_start proto=<HDCProtocol <Peer('127.0.0.1', 29872) NODE2>>
2016-06-02 08:25:13,084 DEBUG:hdc.chainservice  on_wire_protocol_stop proto=<HDCProtocol <Peer('not ready',) NODE2>>
2016-06-02 08:25:14,344 DEBUG:hdc.chainservice  on_wire_protocol_stop proto=<HDCProtocol <Peer('not ready',) NODE2>>
ulope commented 8 years ago

After a long day of debugging @konradkonrad and I discovered a (IMO critical) bug in pydevp2p that caused neighbour discovery to be subtly broken. This has been fixed in ethereum/pydevp2p@c492b009978ef142c826f5940944aacc873ad602.

Unfortunately this only partially fixes this issue. So more investigation is needed.

konradkonrad commented 8 years ago

I can confirm the occasional fails still persist - even with the pydevp2p fix :( One thing to add here: unfortunately we missed https://github.com/ethereum/pydevp2p/pull/43 which found the issue far too long ago, sorry!