geth never stop syncing on rinkeby testnet

cloudsthere commented 6 years ago

when the currentBlock becomes close to the highestBlock, it stop growing, and the highestBlock begin to grow. A while later, the currentBlock begin growing again.

I run geth with command geth --rinkeby --fast.

the highestBlock on my geth is very close to the actual number on https://www.rinkeby.io/#faucet.

> eth.syncing
{
  currentBlock: 2401750,
  highestBlock: 2401826,
  knownStates: 14219701,
  pulledStates: 14205841,
  startingBlock: 2401554
}
> eth.blockNumber
0

logs below, seems normal:

INFO [06-04|15:34:52] Imported new state entries               count=621  elapsed=4.093ms   processed=14288823 pending=12362 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:34:56] Imported new block headers               count=1    elapsed=713.868µs number=2401841 hash=db818c…70c969 ignored=0
INFO [06-04|15:34:57] Imported new state entries               count=1388 elapsed=9.091ms   processed=14290211 pending=12354 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:00] Imported new state entries               count=768  elapsed=9.649ms   processed=14290979 pending=11944 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:02] Imported new state entries               count=607  elapsed=4.707ms   processed=14291586 pending=11757 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:05] Imported new state entries               count=768  elapsed=5.867ms   processed=14292354 pending=11629 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:07] Imported new state entries               count=601  elapsed=4.242ms   processed=14292955 pending=11759 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:09] Imported new state entries               count=601  elapsed=4.924ms   processed=14293556 pending=11479 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:09] Imported new block headers               count=1    elapsed=711.566µs number=2401842 hash=39a2d8…5318ec ignored=0
INFO [06-04|15:35:10] Imported new state entries               count=384  elapsed=3.093ms   processed=14293940 pending=11375 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:11] Imported new state entries               count=384  elapsed=2.660ms   processed=14294324 pending=11365 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:13] Imported new state entries               count=601  elapsed=5.337ms   processed=14294925 pending=11094 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:17] Imported new state entries               count=985  elapsed=6.948ms   processed=14295910 pending=11024 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:20] Imported new state entries               count=602  elapsed=4.317ms   processed=14296512 pending=10940 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:25] Imported new state entries               count=602  elapsed=4.380ms   processed=14297114 pending=10973 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:25] Imported new block headers               count=1    elapsed=469.834µs number=2401843 hash=e8d3a7…152487 ignored=0
INFO [06-04|15:35:25] Imported new state entries               count=384  elapsed=2.758ms   processed=14297498 pending=11062 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:28] Imported new state entries               count=592  elapsed=5.524ms   processed=14298090 pending=11015 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:31] Imported new state entries               count=1210 elapsed=203.329ms processed=14299300 pending=10477 retry=0   duplicate=6543 unexpected=9538
INFO [06-04|15:35:37] Imported new state entries               count=1033 elapsed=1.656ms   processed=14300333 pending=10590 retry=0   duplicate=6543 unexpected=9538
I think I've waited long enough, but the currentBlock just can't reach the highestBlock, even they are very close.

version:

Geth
Version: 1.8.10-stable
Git Commit: eae63c511ceafab14b92e274c1b18bf1700e2d3d
Architecture: amd64
Protocol Versions: [63 62]
Network Id: 1
Go Version: go1.10
Operating System: linux
GOPATH=
GOROOT=/usr/lib/go-1.10

Is this common?

karalabe commented 6 years ago

Syncing Ethereum is a pain point for many people, so I'll try to detail what's happening behind the scenes so there might be a bit less confusion.

The current default mode of sync for Geth is called fast sync. Instead of starting from the genesis block and reprocessing all the transactions that ever occurred (which could take weeks), fast sync downloads the blocks, and only verifies the associated proof-of-works. Downloading all the blocks is a straightforward and fast procedure and will relatively quickly reassemble the entire chain.

Many people falsely assume that because they have the blocks, they are in sync. Unfortunately this is not the case, since no transaction was executed, so we do not have any account state available (ie. balances, nonces, smart contract code and data). These need to be downloaded separately and cross checked with the latest blocks. This phase is called the state trie download and it actually runs concurrently with the block downloads; alas it take a lot longer nowadays than downloading the blocks.

So, what's the state trie? In the Ethereum mainnet, there are a ton of accounts already, which track the balance, nonce, etc of each user/contract. The accounts themselves are however insufficient to run a node, they need to be cryptographically linked to each block so that nodes can actually verify that the account's are not tampered with. This cryptographic linking is done by creating a tree data structure above the accounts, each level aggregating the layer below it into an ever smaller layer, until you reach the single root. This gigantic data structure containing all the accounts and the intermediate cryptographic proofs is called the state trie.

Ok, so why does this pose a problem? This trie data structure is an intricate interlink of hundreds of millions of tiny cryptographic proofs (trie nodes). To truly have a synchronized node, you need to download all the account data, as well as all the tiny cryptographic proofs to verify that noone in the network is trying to cheat you. This itself is already a crazy number of data items. The part where it gets even messier is that this data is constantly morphing: at every block (15s), about 1000 nodes are deleted from this trie and about 2000 new ones are added. This means your node needs to synchronize a dataset that is changing 200 times per second. The worst part is that while you are synchronizing, the network is moving forward, and state that you begun to download might disappear while you're downloading, so your node needs to constantly follow the network while trying to gather all the recent data. But until you actually do gather all the data, your local node is not usable since it cannot cryptographically prove anything about any accounts.

If you see that you are 64 blocks behind mainnet, you aren't yet synchronized, not even close. You are just done with the block download phase and still running the state downloads. You can see this yourself via the seemingly endless Imported state entries [...] stream of logs. You'll need to wait that out too before your node comes truly online.

Q: The node just hangs on importing state enties?!

A: The node doesn't hang, it just doesn't know how large the state trie is in advance so it keeps on going and going and going until it discovers and downloads the entire thing.

The reason is that a block in Ethereum only contains the state root, a single hash of the root node. When the node begins synchronizing, it knows about exactly 1 node and tries to download it. That node, can refer up to 16 new nodes, so in the next step, we'll know about 16 new nodes and try to download those. As we go along the download, most of the nodes will reference new ones that we didn't know about until then. This is why you might be tempted to think it's stuck on the same numbers. It is not, rather it's discovering and downloading the trie as it goes along.

Q: I'm stuck at 64 blocks behind mainnet?!

A: As explained above, you are not stuck, just finished with the block download phase, waiting for the state download phase to complete too. This latter phase nowadays take a lot longer than just getting the blocks.

Q: Why does downloading the state take so long, I have good bandwidth?

A: State sync is mostly limited by disk IO, not bandwidth.

The state trie in Ethereum contains hundreds of millions of nodes, most of which take the form of a single hash referencing up to 16 other hashes. This is a horrible way to store data on a disk, because there's almost no structure in it, just random numbers referencing even more random numbers. This makes any underlying database weep, as it cannot optimize storing and looking up the data in any meaningful way.

Not only is storing the data very suboptimal, but due to the 200 modification / second and pruning of past data, we cannot even download it is a properly pre-processed way to make it import faster without the underlying database shuffling it around too much. The end result is that even a fast sync nowadays incurs a huge disk IO cost, which is too much for a mechanical hard drive.

Q: Wait, so I can't run a full node on an HDD?

A: Unfortunately not. Doing a fast sync on an HDD will take more time than you're willing to wait with the current data schema. Even if you do wait it out, an HDD will not be able to keep up with the read/write requirements of transaction processing on mainnet.

You however should be able to run a light client on an HDD with minimal impact on system resources. If you wish to run a full node however, an SSD is your only option.

cloudsthere commented 6 years ago

Thanks, your reply helps a lot. Syncing finish by now.

sebasgoldberg commented 6 years ago

Great explanation. It is only use SSD instead HDD. Sync finished in minutes. Thanks a lot!

tksavov commented 6 years ago

@karalabe Thank you for the explanation, it was the first exhaustive one I've seen thus far. So according to your comment, at any given point in time there should be a "rough number" of state tries. Is there any place I can check what it is, at least in the millions (e.g. approx 150 million, or 250 million). Thank you in advance!

kwangc commented 6 years ago

Oh, so that's what "imported new state entries" meant. I should've waited longer for downloading states too haha Thanks for the clarification!!

hayorov commented 5 years ago

I wrote a tiny python script to overview the process. It's here https://github.com/hayorov/ethereum-sync-mertics

My output:

2019-05-06 01:00:32 avg: 1827 max: 1938 min: 1378 states/s  remain: 136604075 states     4 peers    eta@ 20:46:28.165828
2019-05-06 01:00:37 avg: 1864 max: 1938 min: 1378 states/s  remain: 136595500 states     3 peers    eta@ 20:21:14.951050
2019-05-06 01:00:42 avg: 1791 max: 1938 min: 1378 states/s  remain: 136583359 states     3 peers    eta@ 21:11:16.481006
2019-05-06 01:00:48 avg: 1742 max: 1938 min: 1378 states/s  remain: 136580287 states     3 peers    eta@ 21:46:35.797305
2019-05-06 01:00:53 avg: 1721 max: 1938 min: 1378 states/s  remain: 136575694 states     3 peers    eta@ 22:03:01.154434
2019-05-06 01:00:58 avg: 1682 max: 1938 min: 1378 states/s  remain: 136569043 states     4 peers    eta@ 22:33:15.402442
2019-05-06 01:01:03 avg: 1698 max: 1938 min: 1378 states/s  remain: 136564293 states     3 peers    eta@ 22:20:27.458747

imprakrut commented 4 years ago

You will have to be patient to sync a node.

It took me 60 hours to sync Rinkeby in fast mode. There were 125M state entries and the folder size was 38GB after synchronization. With time, both these numbers will grow.

You can type eth.syncing in Geth console. If you get 'False' as output, it means that syncing is finished. Otherwise you'll get various details about the blocks and the states.

By typing eth.blockNumber you will get the current block number. If the output is 0 then the syncing is not yet complete.

Here is the image when syncing was complete(all the states were pulled) and I started downloading the chain segments. Rinkeby Sync

sirnicolas21 commented 4 years ago

just to help someone reading this in the feature Deallocated fast sync bloom items=226987492 sync complete today..... you will have more when you read this

kutysam commented 4 years ago

Rinkeby testnet. Geth v1.9.19. INFO [08-24|19:01:37.297] Deallocated fast sync bloom items=143507946 errorrate=0.000 Total time: About 13 hrs on dual core CPU.

Total space: 45.2G

ethereum / go-ethereum

geth never stop syncing on rinkeby testnet #16875