frozeman / pandora-execution-engine

Smart contract execution engine, based on go-ethereum, to work in concert with the LUKSO orchestrator.
https://lukso.network
GNU Lesser General Public License v3.0
5 stars 3 forks source link

Block sync failed for archive node or new node #70

Closed atif-konasl closed 3 years ago

atif-konasl commented 3 years ago

Expected behaviour

Any new node or archive node should sync up with existing running network.

Actual behaviour

When i spin up any archive node and add peer with existing network nodes, it gets warning log like this :

peer=853fb9225a2df78c3f0dc9f3a0362eaa237eaa5b94f8c6ced51587e97071a2d5 count=0   dropped=false timeout=false
WARN [06-22|20:22:40.921] Node data write error                    err="trie node cb739a…5d845a failed with all peers (1 tries, 1 peers)"
DEBUG[06-22|20:22:40.920] Fetching batch of headers                id=853fb9225a2df78c conn=staticdial      count=192 fromnum=1   skip=0   reverse=false
INFO [06-22|20:22:40.921] Downloader queue stats                   receiptTasks=0 blockTasks=0 itemSize=0.00B throttle=8192
TRACE[06-22|20:22:40.921] State sync spinning down                 active=0 finished=0
DEBUG[06-22|20:22:40.921] Block body download terminated           err="syncing canceled (requested)"
DEBUG[06-22|20:22:40.921] Transaction receipt download terminated  err="syncing canceled (requested)"
DEBUG[06-22|20:22:40.921] Header download terminated               peer=853fb922
DEBUG[06-22|20:22:40.921] Synchronisation terminated               elapsed=2.646s
WARN [06-22|20:22:40.921] Synchronisation failed, retrying         err="trie node cb739a…5d845a failed with all peers (1 tries, 1 peers)"

Steps to reproduce the behaviour

make geth
cd ./build/bin
./geth --datadir ./pandora init ./genesis.json
./geth --datadir ./pandora --ethstats dev-pandora-anando-0:VIyf7EjWlR49@catalyst.silesiacoin.com  --bootnodes enode://967db4f56ad0a1a35e3d30632fa600565329a23aff50c9762181810166f3c15b078cca522f930d1a2747778893232336bffd1ea5d2ca60543f1801d4360ea63a@10.0.6.3:0?discport=30301 --networkid 4004181 --rpcaddr 0.0.0.0 --rpcapi admin,eth,net --rpc --rpccorsdomain "*" --miner.etherbase 0xb46d14ef42ac9bb01303ba1842ea784e2460c7e7 --mine --verbosity 10

Open geth json console using:
geth attach http://127.0.0.1:8545
admin.addPeer("enode")

Backtrace

TRACE[06-22|20:22:39.587] Filtering headers                        peer=853fb9225a2df78c3f0dc9f3a0362eaa237eaa5b94f8c6ced51587e97071a2d5 headers=1
DEBUG[06-22|20:22:39.587] Fetching batch of headers                id=853fb9225a2df78c conn=staticdial      count=1  fromnum=11 skip=0  reverse=false
TRACE[06-22|20:22:39.844] Filtering headers                        peer=853fb9225a2df78c3f0dc9f3a0362eaa237eaa5b94f8c6ced51587e97071a2d5 headers=1
DEBUG[06-22|20:22:39.844] Fetching batch of headers                id=853fb9225a2df78c conn=staticdial      count=1  fromnum=5  skip=0  reverse=false
TRACE[06-22|20:22:40.101] Filtering headers                        peer=853fb9225a2df78c3f0dc9f3a0362eaa237eaa5b94f8c6ced51587e97071a2d5 headers=1
DEBUG[06-22|20:22:40.101] Fetching batch of headers                id=853fb9225a2df78c conn=staticdial      count=1  fromnum=2  skip=0  reverse=false
TRACE[06-22|20:22:40.365] Filtering headers                        peer=853fb9225a2df78c3f0dc9f3a0362eaa237eaa5b94f8c6ced51587e97071a2d5 headers=1
DEBUG[06-22|20:22:40.365] Fetching batch of headers                id=853fb9225a2df78c conn=staticdial      count=1  fromnum=1  skip=0  reverse=false
DEBUG[06-22|20:22:40.577] Served admin_peers                       conn=127.0.0.1:51132 reqid=5 t="222.306µs"
TRACE[06-22|20:22:40.620] Filtering headers                        peer=853fb9225a2df78c3f0dc9f3a0362eaa237eaa5b94f8c6ced51587e97071a2d5 headers=1
DEBUG[06-22|20:22:40.620] Found common ancestor                    peer=853fb922 number=0   hash="000000…000000"
DEBUG[06-22|20:22:40.620] Downloading block bodies                 origin=1
DEBUG[06-22|20:22:40.620] Downloading transaction receipts         origin=1
DEBUG[06-22|20:22:40.620] Directing header downloads               peer=853fb922 origin=1
TRACE[06-22|20:22:40.620] Fetching skeleton headers                peer=853fb922 count=192 from=1
DEBUG[06-22|20:22:40.620] Fetching batch of headers                id=853fb9225a2df78c conn=staticdial      count=128 fromnum=192 skip=191 reverse=false
TRACE[06-22|20:22:40.621] State sync starting                      root="cb739a…5d845a"
TRACE[06-22|20:22:40.621] Requesting batch of state data           peer=853fb922 nodes=1 codes=0 root="cb739a…5d845a"
DEBUG[06-22|20:22:40.621] Fetching batch of state data             id=853fb9225a2df78c conn=staticdial      count=1
TRACE[06-22|20:22:40.920] Fetching full headers                    peer=853fb922 count=192 from=1
TRACE[06-22|20:22:40.920] Received node data response              peer=853fb9225a2df78c3f0dc9f3a0362eaa237eaa5b94f8c6ced51587e97071a2d5 count=0   dropped=false timeout=false
WARN [06-22|20:22:40.921] Node data write error                    err="trie node cb739a…5d845a failed with all peers (1 tries, 1 peers)"
DEBUG[06-22|20:22:40.920] Fetching batch of headers                id=853fb9225a2df78c conn=staticdial      count=192 fromnum=1   skip=0   reverse=false
INFO [06-22|20:22:40.921] Downloader queue stats                   receiptTasks=0 blockTasks=0 itemSize=0.00B throttle=8192
TRACE[06-22|20:22:40.921] State sync spinning down                 active=0 finished=0
DEBUG[06-22|20:22:40.921] Block body download terminated           err="syncing canceled (requested)"
DEBUG[06-22|20:22:40.921] Transaction receipt download terminated  err="syncing canceled (requested)"
DEBUG[06-22|20:22:40.921] Header download terminated               peer=853fb922
DEBUG[06-22|20:22:40.921] Synchronisation terminated               elapsed=2.646s
WARN [06-22|20:22:40.921] Synchronisation failed, retrying         err="trie node cb739a…5d845a failed with all peers (1 tries, 1 peers)"
blazejkrzak commented 3 years ago

This happens on archive node when downloading blocks WARN [06-23|12:29:38.421] Node data write error err="trie node 3b9272…09a1b2 failed with all peers (1 tries, 1 peers)"

atif-konasl commented 3 years ago

@blazejkrzak not only archive node but also for any new full node too. No new node can join existing pandora network now.

blazejkrzak commented 3 years ago

There is no slot progression on validator side also: error="rpc error: code = Unavailable desc = Syncing to latest head, not ready to respond I think pandora gives status as syncing so no next block production happen

blazejkrzak commented 3 years ago
DEBUG[06-28|17:38:45.518] Message handling failed in `snap`        peer=ad9cb7b0 err=EOF
DEBUG[06-28|17:38:45.518] Message handling failed in `eth`         id=ad9cb7b04d56282b conn=inbound         err=EOF
DEBUG[06-28|17:38:45.519] Removing Ethereum peer                   peer=ad9cb7b0 snap=true
DEBUG[06-28|17:38:45.519] Removing p2p peer                        peercount=3 id=ad9cb7b04d56282b duration=9.744s        req=true  err="useless peer"
blazejkrzak commented 3 years ago

Possible suspect: StateBloom. When peer requests node info it gets empty [][]byte in response. It looks like it occurs when no transactions were present (Not 100% sure, must confirm it at respin)

At this very moment I am able to download blocks for each of the pods on the staging Only difference is that there was 1 transaction (so maybe state has changed and was migrated from cache to db) Altough staging pods have forked each other due to invalid signature at block no 56

blazejkrzak commented 3 years ago

I close this issue, because of:

  1. StateBloom should be resolved when at least 1 tx is present
  2. Cache size was extended for minimalConsensusInfo (MCI) to 2^20. Last thing to do is to give maxInt32||maxInt64 based on architecture