Open ernopp opened 3 years ago
Unfortunately there's no way to say in advance how many trie nodes there are, it's an explorative procedure that keeps expanding until it is done. Geth v1.10.0 will ship snap sync that does have support for progress monitoring (for the most part).
In your post, one particular thing is the 2000 peer limit. Having that many peers will be quite detrimental to sync as your node will keep requesting data from all of them, and if a large number of them are "junk" peers, they will just waste resources without giving you useful data. I don't think there's a meaningful reason to raise the peer count above the defaults.
AFAIK GCP e2-sandard-2 doesn't have SSD attached. Using an HDD will definitely bleed out. Attaching a remote SSD (if possible?) will probably have a very large IO latency and IOPS limitation so you might end up with a fairly low performance.
Thanks @karalabe for the great suggestion.
Are the below set of monitoring metrics those you would expect during syncing please? We are now 2+ weeks into it, don't seem CPU/Disk/RAM bound, and can't find the bottleneck in our sync.
Is it expected for disk writes to be so low? We're also concerned at the fact the States count is incrementing so slowly (730m -> 760m in the last week).
(admittedly this is a little skinny)
GCP e2-standard-2 (2 vCPUs, 8 GB memory) 500GB SSD drive attached (15,000 IOPS / 240MB/s read/write)
/usr/bin/sudo /usr/bin/geth --rpc --rpcaddr localhost --rpcport 8545 --datadir /ssd/ethereum --nousb --cache=12278 --syncmode fast --metrics --metrics.influxdb --metrics.influxdb.endpoint "http://0.0.0.0:8086" --metrics.influxdb.username "geth" --metrics.influxdb.password "XXX"
> web3.eth.syncing
{
currentBlock: 11971299,
highestBlock: 11971404,
knownStates: 760303367,
pulledStates: 760174176,
startingBlock: 11959099
}
> web3.net.peerCount
50
Read
>sudo hdparm -Ttv /dev/sdb1
`>dev/sdb1:
SG_IO: bad/missing sense data, sb[]: 70 00 05 00 00 00 00 0a 00 00 00 00 20 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
Timing cached reads: 13376 MB in 1.99 seconds = 6726.19 MB/sec
Timing buffered disk reads: 726 MB in 3.00 seconds = 241.67 MB/sec
Latency
sudo ioping /dev/sdb1
4 KiB from /dev/sdb1 (block device 500.0 GiB): request=1 time=316 us
4 KiB from /dev/sdb1 (block device 500.0 GiB): request=2 time=214 us
4 KiB from /dev/sdb1 (block device 500.0 GiB): request=3 time=1.02 ms
4 KiB from /dev/sdb1 (block device 500.0 GiB): request=4 time=528 us
4 KiB from /dev/sdb1 (block device 500.0 GiB): request=5 time=215 us
4 KiB from /dev/sdb1 (block device 500.0 GiB): request=6 time=575 us
4 KiB from /dev/sdb1 (block device 500.0 GiB): request=7 time=611 us
4 KiB from /dev/sdb1 (block device 500.0 GiB): request=8 time=761 us
4 KiB from /dev/sdb1 (block device 500.0 GiB): request=9 time=145 us
^C
--- /dev/sdb1 (block device 500.0 GiB) ioping statistics ---
9 requests completed in 8.96 s, 2.05 k iops, 8.02 MiB/s
min/avg/max/mdev = 145 us / 486 us / 1.02 ms / 273 us
IOPS
sudo fio --name=random-write --ioengine=posixaio --rw=randwrite --bs=4k --numjobs=1 --size=4g --iodepth=1 --runtime=60 --time_based --end_fsync=1
random-write: (g=0): rw=randwrite, bs=4K-4K/4K-4K/4K-4K, ioengine=posixaio, iodepth=1
fio-2.2.10
Starting 1 process
random-write: Laying out IO file(s) (1 file(s) / 4096MB)
Jobs: 1 (f=1): [F(1)] [100.0% done] [0KB/0KB/0KB /s] [0/0/0 iops] [eta 00m:00s]
random-write: (groupid=0, jobs=1): err= 0: pid=4773: Thu Mar 4 11:47:15 2021
write: io=928692KB, bw=9784.5KB/s, iops=2446, runt= 94919msec
Mar 04 12:07:27 eth-node sudo[5509]: INFO [03-04|12:07:27.000] Imported new state entries count=384 elapsed="4.42µs" processed=760466124 pending=174556 trieretry=0 coderetry=0 duplicate=0 unexpected=20952
Mar 04 12:07:32 eth-node sudo[5509]: INFO [03-04|12:07:32.398] Imported new state entries count=384 elapsed="248.761µs" processed=760466508 pending=175844 trieretry=0 coderetry=0 duplicate=0 unexpected=20952
Mar 04 12:07:34 eth-node sudo[5509]: INFO [03-04|12:07:34.541] Imported new block headers count=1 elapsed=13.565ms number=11971679 hash="83ab70…9b1d76"
Mar 04 12:07:37 eth-node sudo[5509]: INFO [03-04|12:07:37.288] Imported new state entries count=384 elapsed="535.156µs" processed=760466892 pending=176850 trieretry=0 coderetry=0 duplicate=0 unexpected=20952
Mar 04 12:07:41 eth-node sudo[5509]: INFO [03-04|12:07:41.833] Imported new state entries count=384 elapsed="282.606µs" processed=760467276 pending=178562 trieretry=0 coderetry=0 duplicate=0 unexpected=20952
Mar 04 12:07:43 eth-node sudo[5509]: INFO [03-04|12:07:43.709] Imported new block headers count=2 elapsed=10.507ms number=11971681 hash="25fd63…23e1ee"
Mar 04 12:07:47 eth-node sudo[5509]: INFO [03-04|12:07:47.739] Imported new state entries count=384 elapsed="256.671µs" processed=760467660 pending=180108 trieretry=0 coderetry=0 duplicate=0 unexpected=20952
Mar 04 12:07:47 eth-node sudo[5509]: INFO [03-04|12:07:47.742] Imported new state entries count=0 elapsed="15.281µs" processed=760467660 pending=180108 trieretry=6 coderetry=0 duplicate=0 unexpected=20954
Mar 04 12:07:51 eth-node sudo[5509]: INFO [03-04|12:07:51.617] Imported new state entries count=384 elapsed="366.64µs" processed=760468044 pending=182114 trieretry=2 coderetry=0 duplicate=0 unexpected=20954
Mar 04 12:07:56 eth-node sudo[5509]: INFO [03-04|12:07:56.210] Imported new state entries count=384 elapsed="88.725µs" processed=760468428 pending=183708 trieretry=4 coderetry=0 duplicate=0 unexpected=20954
Mar 04 12:08:01 eth-node sudo[5509]: INFO [03-04|12:08:01.004] Imported new state entries count=384 elapsed="405.227µs" processed=760468812 pending=185378 trieretry=0 coderetry=0 duplicate=0 unexpected=20954
Mar 04 12:08:04 eth-node sudo[5509]: INFO [03-04|12:08:04.631] Imported new state entries count=384 elapsed="195.561µs" processed=760469196 pending=187062 trieretry=2 coderetry=0 duplicate=0 unexpected=20954
Mar 04 12:08:09 eth-node sudo[5509]: INFO [03-04|12:08:09.812] Imported new state entries count=384 elapsed="121.964µs" processed=760469580 pending=188486 trieretry=0 coderetry=0 duplicate=0 unexpected=20954
Also asking here as well, because there's been no indication in the other threads, yet ... I've read the FAQ and I understand that the node doesn't know by itself how far it has advanced in the sync process (except for the initial state sync phase, where it shows a percentage completion indicator).
However, can one of the metrics from the log files be used to estimate overall progress if one also has the log files of a different node that recently completed syncing? Is there some metric that could be compared across the logs?
Would this differ between Snap Sync and Fast Sync?
This would help people understand if Snap Sync is broken (which it seems to be for many here), or if it's just super slow for whatever reason.
Hey @karalabe,
now that Geth v1.10.x is released and Snap syncs are present, would you point us to the metric that answers this issue? I went to the metric page but it's many hundreds of lines long and I couldn't find it.
I know of at least two exporters that call the RPC API in order to provide syncing metrics:
Why the Geth metrics do not provide a metric for currentBlock
and highestBlock
from eth_syncing
is beyond me.
It seems like such an obvious thing to add to the available metrics.
I personally use this one.
This issue still persists and together with #25043 would be a great improvement to the monitoring of Geth nodes.
I work in a Node as Service provider and monitoring of Geth processes is surely the biggest part our in-house tooling. Better monitoring out-of-the-box is long needed.
Hey everyone,
First time setting up a full node with Geth and was hoping someone could help me better track progress.
Have been running for about 8 days and I believe geth is currently doing the "state sync"
1) Based on the details and performance below, do you think the sync will ever complete? Or I need to change some settings / up the machine specs?
2) Do you have any recommendations on metrics to track the progress of the sync? I also have the InfluxDb+Grafana setup described here
Many thanks
Geth Settings
Machine Specs
GCP e2-standard-2 (2 vCPUs, 8 GB memory)
500GB SSD drive
Machine Performance
eth.syncing output
Data Stored
Latest logs sample