ethereum / go-ethereum

Go implementation of the Ethereum protocol
https://geth.ethereum.org
GNU Lesser General Public License v3.0
47.62k stars 20.16k forks source link

geth node is consistently behind the mainnet #16218

Closed cezekwe closed 5 years ago

cezekwe commented 6 years ago

Hi there,

System information

Geth version: 1.8.1 OS & Version: CoreOS (running geth in Docker container) Hardware: m5.large

Expected behaviour

In the article for Iceberg's release (https://blog.ethereum.org/2018/02/14/geth-1-8-iceberg%C2%B9/), the author said that they were able to complete a fast sync in ~3 hours using an m4.2xlarge instance. Could you guys (devs) tell me what command line options you used for that test?

Actual behaviour

I am running on similar hardware and my geth node has been 65 blocks behind the mainnet for the past 2 days.

Steps to reproduce the behaviour

Launch geth container on CoreOS. Geth should run these command line options: --verbosity 4 --metrics --maxpeers=50 --ipcdisable --v5disc --rpc --rpcvhosts=* --port=30303 --syncmode fast --cache=2048 --rpcaddr 0.0.0.0 --rpcport 8545 --rpcapi db,eth,net,web3,personal,admin,txpool,debug

Backtrace

geth.log

Fargusson commented 6 years ago

16202

karalabe commented 6 years ago

I'd recommend against running with verbosity 4 all the time, that's too much. We just ran with --cache=2048.

bobbieltd commented 6 years ago

Dedicated 4 CPUs, 8Gb RAM, Ubuntu 16.04 instance: Geth/v1.8.1-stable-1e67410e/linux-amd64/go1.9.4 Command line : geth --port=30303 --syncmode fast --cache=1024 Geth keep chasing with 65 blocks like this person.

eth.syncing { currentBlock: 5178990, highestBlock: 5179055, knownStates: 70135573, pulledStates: 70120348, startingBlock: 5178266 }

For two days. (I tried and runned on multiple servers, three servers with db errors, the others got 65 blocks chasing)

b-f- commented 6 years ago

Same here. I run little or no apps on the machine, maybe some browser instances. Win 10 core i7 cpu, 10gb ram, 1 tb hdd, 64mb cache, used for eth blockchain only. geth 1.8.1, mist 0.9.3 100 mbit eth -> optic internet connection

Used geth --fast --cache=1024 for initial sync. Caught up in cca 10h. Now running with --cache=1048 --maxpeers=150. Geth catches up quickly, but never completes last n < 100 blocks. Seems to work a bit faster and causes less noise on HDD than geth 1.7.2, where it couldn't complete last n < 200 blocks. When I run Mist, geth seems to stop working after a while. Just does nothing at all, no messages in console, no cpu usage, batch job termination works, so it's not frozen. But it's not synced, at least that's what Mist thinks. image After several hours of running geth (height=5182550): image

@karalabe I tried using --cache=2048 and even 4096. Same result as above with --1024, with 4096 the OS starts swapping after a while, for some reason, even though all memory is not being used (memory leaks?). Note: "Disk storage enabled for ethash DAGs" in console points to an inexistant path, while "Disk storage enabled for ethash caches" points to a correct custom path.

Sample log dump indicates only the speed of block acquisition. Geth is currently behind one day: INFO [03-02|01:27:46] Imported new state entries count=444 elapsed=0s processed=10470 pending=21570 retry=0 duplicate=1 unexpected=165 INFO [03-02|01:27:50] Imported new block headers count=0 elapsed=0s number=5111195 hash=edc498…bb5428 ignored=1 INFO [03-02|01:28:00] Imported new block headers count=0 elapsed=0s number=5111197 hash=36486e…d94fc8 ignored=2 INFO [03-02|01:28:06] Imported new state entries count=273 elapsed=0s processed=10743 pending=4369 retry=0 duplicate=1 unexpected=165 INFO [03-02|01:28:12] Imported new block headers count=0 elapsed=0s number=5111198 hash=fa04b8…263d76 ignored=1 INFO [03-02|01:28:15] Imported new block headers count=0 elapsed=0s number=5111199 hash=b3e461…d9db76 ignored=1 INFO [03-02|01:28:24] Imported new block headers count=0 elapsed=0s number=5111200 hash=da08e9…74d79d ignored=1 INFO [03-02|01:28:40] Imported new state entries count=384 elapsed=0s processed=11127 pending=10356 retry=384 duplicate=1 unexpected=165 WARN [03-02|01:28:53] Synchronisation failed, retrying err="header processing canceled (requested)"

It's a bit strange, though, that I can see my address's state on etherscan, no outgoing transactions, but this is not shown in wallet: "If your balance doesn't seem updated, make sure that you are in sync with the network." Not what I expect as occasional end user. The wallet is telling me it doesn't know if it's synced or not?

bobbieltd commented 6 years ago

In my case, after “Tom and Jerry” for quite long time, it gets synced now (eth.syncing = false). Just let it running and chasing (don’t turn it off).

karalabe commented 6 years ago

@b-f- Geth requires an SSD currently. If you only have an HDD, please use the light client.

riceke commented 6 years ago

Hello!

I am having the exact same issue as cue0083.

I am running Geth 1.8.1 on a Raspberry Pi 3 (yes I know people are saying that you cannot run a full Ethereum node on a Raspberry any more but I have not read any convincing arguments why it should not work to do a fast sync).

The CPU utilization (checking with top) is around 10%-20% average, so that does not seem to be an issue. The RAM is more or less fully allocated (which you would expect with only 1G) but the 2G swap file is barely used, so memory does not seem to be an issue either really.

The I/O utilization (checking with iotop) does not really show anything exceptional either and the 128GB SD card (SanDisk) has some 50GB still free (while trimming/erasing now and then to minimize the amount of garbage that the wear levelling mechanism potentially has to move around).

Network bandwidth utilization is neither at any alarming level (as reported by my router) although I can see that there is more or less constant communication going on between the node and peer(s).

I am running a fast sync that appears to stop (i.e. 'eth.syncing' starts returning 'false') when it is 65 blocks behind the highest block and then just "hangs" making no apparent further progress. When I restart Geth it will relatively quickly catch up to 65 blocks behind the new highest block and do some state downloading until it eventually "stops" again and 'eth.syncing' starts returning 'false'.

Now, the curious thing is that when I check the stack trace after the syncing has "stopped" (i.e. 'eth.syncing' starts returning 'false') I can see that there appears to still be syncing going on with a single peer as the '(*Downloader).spawnSync' is still waiting on a channel (and has been for 1029 minutes) while the various goroutines attached to the channel ('fetchHeaders', 'fetchReceipts', 'fetchBodies' and so on) do NOT appear to "hang" on their selects but rather being busy downloading data (as indicated by 'fetchParts' apparently being called repeatedly).

So, the question in my head is now: why does Geth appear to still be in syncing mode while 'eth.syncing' is returning 'false' and what is my node doing being stuck on the same peer for hours downloading data that then appears to be just thrown away, or what?

I have attached a stack trace and also ten minutes worth of Go trace if that could be of any use.

Thanks!

debug-info.zip

karalabe commented 6 years ago

Syncing Ethereum is a pain point for many people, so I'll try to detail what's happening behind the scenes so there might be a bit less confusion.

The current default mode of sync for Geth is called fast sync. Instead of starting from the genesis block and reprocessing all the transactions that ever occurred (which could take weeks), fast sync downloads the blocks, and only verifies the associated proof-of-works. Downloading all the blocks is a straightforward and fast procedure and will relatively quickly reassemble the entire chain.

Many people falsely assume that because they have the blocks, they are in sync. Unfortunately this is not the case, since no transaction was executed, so we do not have any account state available (ie. balances, nonces, smart contract code and data). These need to be downloaded separately and cross checked with the latest blocks. This phase is called the state trie download and it actually runs concurrently with the block downloads; alas it take a lot longer nowadays than downloading the blocks.

So, what's the state trie? In the Ethereum mainnet, there are a ton of accounts already, which track the balance, nonce, etc of each user/contract. The accounts themselves are however insufficient to run a node, they need to be cryptographically linked to each block so that nodes can actually verify that the account's are not tampered with. This cryptographic linking is done by creating a tree data structure above the accounts, each level aggregating the layer below it into an ever smaller layer, until you reach the single root. This gigantic data structure containing all the accounts and the intermediate cryptographic proofs is called the state trie.

Ok, so why does this pose a problem? This trie data structure is an intricate interlink of hundreds of millions of tiny cryptographic proofs (trie nodes). To truly have a synchronized node, you need to download all the account data, as well as all the tiny cryptographic proofs to verify that noone in the network is trying to cheat you. This itself is already a crazy number of data items. The part where it gets even messier is that this data is constantly morphing: at every block (15s), about 1000 nodes are deleted from this trie and about 2000 new ones are added. This means your node needs to synchronize a dataset that is changing 200 times per second. The worst part is that while you are synchronizing, the network is moving forward, and state that you begun to download might disappear while you're downloading, so your node needs to constantly follow the network while trying to gather all the recent data. But until you actually do gather all the data, your local node is not usable since it cannot cryptographically prove anything about any accounts.

If you see that you are 64 blocks behind mainnet, you aren't yet synchronized, not even close. You are just done with the block download phase and still running the state downloads. You can see this yourself via the seemingly endless Imported state entries [...] stream of logs. You'll need to wait that out too before your node comes truly online.


Q: The node just hangs on importing state enties?!

A: The node doesn't hang, it just doesn't know how large the state trie is in advance so it keeps on going and going and going until it discovers and downloads the entire thing.

The reason is that a block in Ethereum only contains the state root, a single hash of the root node. When the node begins synchronizing, it knows about exactly 1 node and tries to download it. That node, can refer up to 16 new nodes, so in the next step, we'll know about 16 new nodes and try to download those. As we go along the download, most of the nodes will reference new ones that we didn't know about until then. This is why you might be tempted to think it's stuck on the same numbers. It is not, rather it's discovering and downloading the trie as it goes along.

Q: I'm stuck at 64 blocks behind mainnet?!

A: As explained above, you are not stuck, just finished with the block download phase, waiting for the state download phase to complete too. This latter phase nowadays take a lot longer than just getting the blocks.

Q: Why does downloading the state take so long, I have good bandwidth?

A: State sync is mostly limited by disk IO, not bandwidth.

The state trie in Ethereum contains hundreds of millions of nodes, most of which take the form of a single hash referencing up to 16 other hashes. This is a horrible way to store data on a disk, because there's almost no structure in it, just random numbers referencing even more random numbers. This makes any underlying database weep, as it cannot optimize storing and looking up the data in any meaningful way.

Not only is storing the data very suboptimal, but due to the 200 modification / second and pruning of past data, we cannot even download it is a properly pre-processed way to make it import faster without the underlying database shuffling it around too much. The end result is that even a fast sync nowadays incurs a huge disk IO cost, which is too much for a mechanical hard drive.

Q: Wait, so I can't run a full node on an HDD?

A: Unfortunately not. Doing a fast sync on an HDD will take more time than you're willing to wait with the current data schema. Even if you do wait it out, an HDD will not be able to keep up with the read/write requirements of transaction processing on mainnet.

You however should be able to run a light client on an HDD with minimal impact on system resources. If you wish to run a full node however, an SSD is your only option.

bobbieltd commented 6 years ago

Thank you for very thoroughful explanation. You should put in readme that fast sync requirements : Fast IO SSD so people don’t lose time.

riceke commented 6 years ago

Hello!

Yes, I realize that there are tons of states hanging under each block (header) but the thing that was puzzling me was that while geth appears to still be syncing (as the stack trace seems to indicate) the 'eth.syncing' call is returning 'false'. I think this is what causes people to feel that geth "hangs", because it claims it is not syncing any more.

Please see the stack trace and other debug info in my previous post.

Thanks!

nuliknol commented 6 years ago

A: Unfortunately not. Doing a fast sync on an HDD will take more time than you're willing to wait with the current data schema. Even if you do wait it out, an HDD will not be able to keep up with the read/write requirements of transaction processing on mainnet

This is a problem not of an HDD, but the design of geth and the choice of using a NoSQL database like LevelDB. SQL databases have a transaction log, and every change is written to this log, later they are reproduced by making these changes on the corresponding sector/cylinder/head. So, an SQL database does 2 writes, first write in the WAL (Write Ahead Log) and the second is the good write since it can combine many records in a single operation. I suppose (since it is very light) LevelDB is not a designed to manage high volume of transactions by accessing disk every time it needs some data, and this is why you face this problem. The SQL databases like Postgres can manage large volumes of INSERTs and UPDATEs because their engines were designed for such speed. Now, Ethereum has a 15 transactions per second rate, it is ridiculously low volume to be limited by IO bottleneck. If you would drop LevelDB and write blocks directly through kernel system calls you would achieve much better performance, because you would be able to organize adjacent data and stored it in a single write(), for example.

But anyway, you can solve current problem by replacing LevelDB with another more powerful NoSQL database engine, or increasing the cache size of LevelDB. Because LevelDB doesn't have to send SEEKs to the HDD every time it has to read something, if you cache the entire state, the READ operations would not be necessary at all. And for WRITE operations you can combine the requests for the same physical block.

So, in theory you should be able to run Ethereum node on an HDD, because it is not a hardware limitation, it is a limitation of design. The average SEEK time of an HDD is 9 milliseconds, doing some basic math, you can execute 111 WRITE or READ operations per second without bottleneck. Ethereum has 15 transactions per second rate, so if every transaction does 4 READs and 3 WRITEs you would be able to use an HDD. Now, this is if no cache mechanisms are used, but with cache the performance would accelerate dramatically, like a 500x or 1000x speedup, depending on the ram you would be wiling to consume.

Today's programmers write software without knowing much about hardware, that's why users are suffering.

karalabe commented 6 years ago

Today's programmers write software without knowing much about hardware, that's why users are suffering.

Apparently today's internet users post comments without knowing much about the issue they comment on.

karalabe commented 6 years ago

Oh, and an Ethereum block currently contains about 175 txs, and it's processed in about 200ms, so the true throughput is 875 TPS, but the PoW mining and block propagation prevents pushing all hardware to its limits.

chelinho139 commented 6 years ago

we are having same issue, syncing for more than 24hours, how long till states download? eth.blockNumber=0

eth.syncing { currentBlock: 5434969, highestBlock: 5435034, knownStates: 87544536, pulledStates: 87468665, startingBlock: 5434968 }

Geth Version: 1.8.2-stable Git Commit: b8b9f7f4476a30a0aaf6077daade6ae77f969960 Architecture: amd64 Protocol Versions: [63 62] Network Id: 1 Go Version: go1.9.4 Operating System: linux GOPATH= GOROOT=/usr/lib/go-1.9

Any suggestions?

kingjerod commented 6 years ago

Getting this issue with with v1.8.4-stable-2423ae01. Running on an AWS m5.large (2 vCPU 8 gigs of ram) with a 300GB general purpose SSD attached.

Command (initially used "fast" for the first day): geth --syncmode "full" --ws --wsapi "web3,net,rpc,admin,eth" --wsport 5000 --wsaddr "0.0.0.0" --wsorigins "*" --cache 4096 --verbosity 5

Watching the logs, it seems like it just continually imports new state entries. I've had it running for about 4 days, had to restart a few times because the logs showed no activity for hours. It always seems to be "almost" caught up:

> eth.syncing
{
  currentBlock: 5483568,
  highestBlock: 5483618,
  knownStates: 60628953,
  pulledStates: 60623056,
  startingBlock: 5482684
}

What's the best way to figure out the issue? I can't imagine it should take 4 days to sync up.

Edit: Opened up port 30303 TCP and went from max 8 peers to 24 now.

ldeffenb commented 6 years ago

So, if I understand the INFO logs about importing new state entries, I'll actually be done syncing the trie state when the pending number quits bouncing up and down and eventually gets to zero? I'm assuming (a dangerous thing, I know) that the pending count is the number of entries that are linked to, but not yet loaded. Mine is currently fluctuating between 7,000 and 9,000 on TestNet. I've Processed 28,527,973 so far.

kingjerod commented 6 years ago

After upgrading to version 1.8.6 my chain synced up after about a day (I wasn't watching it closely so might have been faster).

bobbieltd commented 6 years ago

@kingjerod : Can you provide some hardware infos (especially SSD/HDD) ?

kingjerod commented 6 years ago

Running it on an AWS m5.large with Ubuntu 16.04, has 2 virtual CPUs and 8 gigs of ram. https://aws.amazon.com/ec2/instance-types/m5/

Running it with an SSD (general purpose) volume provisioned to 300GB. Currently synced and it's using 90GB.

I think the latest version still needs some work, because it's the only process running on the server and it's almost maxed out both cores (CPU at 193%) and memory is at 6GB after running for 45 minutes.

If you're trying to run this on a laptop, be prepared to have your computer crawl.

bobbieltd commented 6 years ago

Thanks for info. It seems ETH syncing is still a hard task.

kingjerod commented 6 years ago

@bobbieltd If you just need the http rpc I would recommend https://infura.io

omar-saadoun commented 6 years ago

there is no way to get synced with an HDD, it is enough to sync with ROPSTEN. @kingjerod recommendation is a good one for RPC

CryptoKiddies commented 6 years ago

@karalabe do you think when relying on shared services like AWS, dedicated hardware tenancy can reduce network slowdown? My "box" can handle 3000 IOPS, but there are still days with frequent sync in and out phasing. I'm wondering if this is an issue of my cloud service neighbors hogging up bandwidth. I'll contact my cloud service as well to figure this out.

dformdotdk commented 6 years ago

How much space would I need to run a full node - I've read somewhere that the size of the blockchain was about 1TB, but others talk about 100GB - big difference...

kingjerod commented 6 years ago

@dyvel It depends if you do a full sync or a fast sync. If you do a full it might fit inside 1TB. If you don't care about the history of transactions, a fast sync will work and I imagine might fit in under 100GB.

holiman commented 5 years ago

As this ticket mostly concerns the performance, and is not directly a flaw in the code to be fixed (other than, make it faster), I'm going to close this ticket. Feel free to open a new ticket if there's something I missed.

vindard commented 5 years ago

Many people falsely assume that because they have the blocks, they are in sync. Unfortunately this is not the case, since no transaction was executed, so we do not have any account state available (ie. balances, nonces, smart contract code and data). These need to be downloaded separately and cross checked with the latest blocks. This phase is called the state trie download and it actually runs concurrently with the block downloads

Question for @karalabe or anyone else who knows: what is it exactly that's in a block? What do you actually have from the network when you "have a block"?

As far as the tries go, I assume the state trie is separate, but are the transaction and transaction receipt tries included in "block download" or are they also separate somehow. I assume that at the very least the transaction trie is included, but I'm unsure where the transaction receipt trie is also downloaded or whether that's derived by playing back transactions from the transaction trie.

image

paul-arssov commented 5 years ago

very informative messages! can a user make a successful rpc call to - eth_sendRawTransaction , or eth_call when the geth is running but not yet in sync?

sssubik commented 4 years ago

Hey I mistakely set my Cache to 1024 at the start of the sync and it has already been couple of days. Can I somehow change the size of Cache to 4096 by stopping the sync and starting again?

nuliknol commented 3 years ago

This is a problem not of an HDD, but the design of geth and the choice of using a NoSQL database like Level

Today's programmers write software without knowing much about hardware, that's why users are suffering.

I am glad that 2 and half years later, you finally understood what I have said. As I am reading Geth 1.9 release notes and it says:

The discovery and optimization of a quadratic CPU and disk IO complexity, originating from the Go implementation of LevelDB. This caused Geth to be starved and stalled, exponentially getting worse as the database grew. Huge shoutout to Gary Rong for his relentless efforts, especially as his work is beneficial to the entire Go community.

There is still a lot of room to improve performance of trie storage starting from the design.

paul-arssov commented 3 years ago

Hi,

Thank you for the update. As a system level 'C' programmer I dislike 'popular' languages like go, and the packaged 'goods' with it. As the design of the block does not change there is probably no need to use a standard database in the first place.

Happy New Year!

Paul.

On 2021-01-01 8:11 p.m., nuliknol wrote:

This is a problem not of an HDD, but the design of |geth| and the
choice of using a NoSQL database like Level

Today's programmers write software without knowing much about
hardware, that's why users are suffering.

I am glad that 2 and half years later, you finally understood what I have said. As I am reading Geth 1.9 release notes and it says:

The discovery and optimization of a quadratic CPU and disk IO
complexity, originating from the Go implementation of LevelDB. This
caused Geth to be starved and stalled, exponentially getting worse
as the database grew. Huge shoutout to Gary Rong for his relentless
efforts, especially as his work is beneficial to the entire Go
community.

There is still a lot of room to improve performance of trie storage starting from the design.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/ethereum/go-ethereum/issues/16218#issuecomment-753359227, or unsubscribe https://github.com/notifications/unsubscribe-auth/AMCJ2OKX63EME3UMJGFTIKTSXYF3NANCNFSM4ES4AXVQ.

willrnch commented 3 years ago

@nuliknol @karalabe

I'm trying to sync a full node on a server with 10TB+ of HDD and 448GB of SSD. The data dir is on an HDD. If I mount my SSD as a swap partition and increase the cache (using the --cache option) will that work? Because if I understand correctly, storing the DB on an HDD is not bad per say. It's just LevelDB that will required a SSD.