ethereum / go-ethereum

Go implementation of the Ethereum protocol
https://geth.ethereum.org
GNU Lesser General Public License v3.0
47.6k stars 20.16k forks source link

How can I check the latest healedTrienodes height of the main network? #25703

Closed litcc closed 2 years ago

litcc commented 2 years ago

This should only be used in very rare cases e.g. if you are not 100% sure if something is a bug or asking a question that leads to improving the documentation. For general questions please use discord or the Ethereum stack exchange at https://ethereum.stackexchange.com.

How can I check the latest healedTrienodes height of the main network?

eth.syncing { currentBlock: 15487937, healedBytecodeBytes: 380132516, healedBytecodes: 51962, healedTrienodeBytes: 111404821735, healedTrienodes: 318328110, healingBytecode: 0, healingTrienodes: 3197, highestBlock: 15488031, startingBlock: 15470245, syncedAccountBytes: 38722876923, syncedAccounts: 158033944, syncedBytecodeBytes: 2816834450, syncedBytecodes: 476103, syncedStorage: 569302144, syncedStorageBytes: 122028748654 }

Has been synchronized for nearly a few months, data synchronization speed has been very slow, do not know where the problem, healedTrienodes height can rise a 50 million per day, do not know when to synchronize the end

MariusVanDerWijden commented 2 years ago

Healed trie nodes strongly depends on your snap sync speed. Basically you will sync an old state of the network and need to update it to the newest state. The longer you take snap syncing, the older and more outdated your state is and you will need longer to heal. If you are taking multiple days to heal, you might want to consider stopping the node, deleting the state and starting a snap sync from anew. Usually a state heal is done in < 15h

litcc commented 2 years ago

What do I need to do to properly delete the status without damaging the currently downloaded block data?

The current start command is :

docker run -d --restart unless-stopped --name=eth -p 8545:8545 -p 8546:8546 -p 30303:30303  -v /mnt/ethdatabase:/root/.ethereum ethereum/client-go:v1.10.21 --mainnet --syncmode snap --http --http.api web3,eth,net,admin,personal,debug --http.addr 0.0.0.0 --http.port 8545 --ws --ws.addr 0.0.0.0 --ws.port 8546 --ws.origins=* --maxpeers 1000
MariusVanDerWijden commented 2 years ago

Use geth removedb y n This will only remove the state and leave the history intact

litcc commented 2 years ago

Do I need to backup the data? Will it still be slow to sync after deletion and start from 0?

MariusVanDerWijden commented 2 years ago

You don't need to backup anything. It might still be slow to sync if your network connection is bad or your disk is too slow. But it might be faster than it was if it doesn't need to download the old blocks

litcc commented 2 years ago

Is this log normal? I see ERROR level errors in it too.

eth-node_log.txt

MariusVanDerWijden commented 2 years ago

That looks okay to me, you are state syncing eta=74h56m7.909s doesn't look too great though, your network seems to be not great?

litcc commented 2 years ago

oh, yes, my network here is rather poor

This is the latest log, if eta is synchronizing the remaining time, the speed is acceptable at the moment, after all, we have all waited for more than 2 months before. 😂

State sync in progress                   synced=88.10% state=171.35GiB accounts=168,001,945@36.86GiB slots=655,186,574@130.92GiB codes=592,749@3.57GiB    eta=3h24m35.358s
litcc commented 2 years ago

Is this log normal? I see ERROR level errors in it too.

eth-node_log-1.txt

Sync information when state is not deleted:
{
currentBlock: 15487937,
healedBytecodeBytes: 380132516,
healedBytecodes: 51962,
healedTrienodeBytes: 111404821735,
healedTrienodes: 318328110,
healingBytecode: 0,
healingTrienodes: 3197,
highestBlock: 15488031,
startingBlock: 15470245,
syncedAccountBytes: 38722876923,
syncedAccounts: 158033944,
syncedBytecodeBytes: 2816834450,
syncedBytecodes: 476103,
syncedStorage: 569302144,
syncedStorageBytes: 122028748654
}

Execute the data after the deletion status:
{
currentBlock: 15501384,
healedBytecodeBytes: 0,
healedBytecodes: 0,
healedTrienodeBytes: 3390436,
healedTrienodes: 6373,
healingBytecode: 0,
healingTrienodes: 1108,
highestBlock: 15501449,
startingBlock: 14960010,
syncedAccountBytes: 44923080753,
syncedAccounts: 190132727,
syncedBytecodeBytes: 4305582024,
syncedBytecodes: 661006,
syncedStorage: 769913336,
syncedStorageBytes: 165063448160
}

Hello, according to the data returned by eth.syncing, I found that the node started to synchronize healedTrienodes again and started from 0. I don't know when the synchronization will be finished and the speed is very slow.

karalabe commented 2 years ago

We've merged a lot of updates onto master that makes the heal part faster, a lot faster. You might want to update to that.

litcc commented 2 years ago

We've merged a lot of updates onto master that makes the heal part faster, a lot faster. You might want to update to that.

I've used the latest release ethereum/client-go:v1.10.23, isn't this version the latest one?

suna4 commented 2 years ago

Maybe you should use ethereum/client-go:latest until v1.11.0 is released.

litcc commented 2 years ago

I used the version: ethereum/client-go@sha256:3e0a278e6a27fb80ac1e91c45f5ee9dffa32d4b3d5fddf454ec40a1342a9148d

At present, the synchronization is still slow, can not determine the completion time of synchronization, the pending value will become large and then small, it is not clear whether it is normal, the logs are as follows.

eth-node_log-2.txt

mutazmq1 commented 2 years ago

If your healing is taking forever, this is how I solved it after weeks of trying:

litcc commented 2 years ago

I'm not sure if it's because of the hard drive read/write speed

cat@cat:/mnt/ethdatabase/test$ sudo fio -filename=/mnt/ethdatabase/test/test_randread -direct=1 -iodepth 1 -thread -rw=randread -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=mytest
mytest: (g=0): rw=randread, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
...
fio-3.1
Starting 10 threads
mytest: Laying out IO file (1 file / 2048MiB)
Jobs: 10 (f=10): [r(10)][34.2%][r=0KiB/s,w=0KiB/s][r=0,w=0 IOPS][eta 02m:01s]
mytest: (groupid=0, jobs=10): err= 0: pid=31067: Wed Sep 14 11:05:29 2022
   read: IOPS=7049, BW=110MiB/s (116MB/s)(6933MiB/62938msec)
    clat (usec): min=67, max=13098k, avg=1413.32, stdev=85185.32
     lat (usec): min=67, max=13098k, avg=1413.79, stdev=85185.32
    clat percentiles (usec):
     |  1.00th=[  147],  5.00th=[  482], 10.00th=[  537], 20.00th=[  627],
     | 30.00th=[  693], 40.00th=[  758], 50.00th=[  824], 60.00th=[  881],
     | 70.00th=[  947], 80.00th=[ 1012], 90.00th=[ 1106], 95.00th=[ 1188],
     | 99.00th=[ 1401], 99.50th=[ 2704], 99.90th=[ 4047], 99.95th=[ 4359],
     | 99.99th=[34866]
   bw (  KiB/s): min= 1504, max=20512, per=16.35%, avg=18439.90, stdev=2688.99, samples=770
   iops        : min=   94, max= 1282, avg=1152.49, stdev=168.06, samples=770
  lat (usec)   : 100=0.10%, 250=2.01%, 500=4.17%, 750=31.99%, 1000=39.70%
  lat (msec)   : 2=21.44%, 4=0.49%, 10=0.09%, 20=0.01%, 50=0.01%
  lat (msec)   : 100=0.01%, 500=0.01%, >=2000=0.01%
  cpu          : usr=0.76%, sys=3.29%, ctx=443842, majf=0, minf=1046
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=443699,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=110MiB/s (116MB/s), 110MiB/s-110MiB/s (116MB/s-116MB/s), io=6933MiB (7270MB), run=62938-62938msec

Disk stats (read/write):
    dm-0: ios=443699/9, merge=0/0, ticks=478060/66820, in_queue=678800, util=100.00%, aggrios=221849/3, aggrmerge=0/2, aggrticks=304204/14322, aggrin_queue=319260, aggrutil=100.00%
  sdg: ios=0/2, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  sdf: ios=443699/5, merge=0/4, ticks=608408/28644, in_queue=638520, util=100.00%
cat@cat:/mnt/ethdatabase/test$ sudo fio -filename=/mnt/ethdatabase/test/test_randrea2 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=mytest
mytest: (g=0): rw=randwrite, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
...
fio-3.1
Starting 10 threads
mytest: Laying out IO file (1 file / 2048MiB)
Jobs: 10 (f=10): [w(10)][100.0%][r=0KiB/s,w=42.1MiB/s][r=0,w=2694 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=10): err= 0: pid=4587: Wed Sep 14 11:17:08 2022
  write: IOPS=2124, BW=33.2MiB/s (34.8MB/s)(1992MiB/60003msec)
    clat (usec): min=320, max=321827, avg=4700.51, stdev=6167.38
     lat (usec): min=321, max=321828, avg=4701.78, stdev=6167.38
    clat percentiles (usec):
     |  1.00th=[  1057],  5.00th=[  1254], 10.00th=[  1418], 20.00th=[  2008],
     | 30.00th=[  2540], 40.00th=[  3032], 50.00th=[  3425], 60.00th=[  3851],
     | 70.00th=[  4359], 80.00th=[  4883], 90.00th=[  6718], 95.00th=[ 15795],
     | 99.00th=[ 28705], 99.50th=[ 32113], 99.90th=[ 49546], 99.95th=[ 98042],
     | 99.99th=[173016]
   bw (  KiB/s): min=  160, max= 4800, per=10.00%, avg=3400.08, stdev=1375.21, samples=1200
   iops        : min=   10, max=  300, avg=212.45, stdev=85.92, samples=1200
  lat (usec)   : 500=0.12%, 750=0.54%, 1000=0.12%
  lat (msec)   : 2=19.06%, 4=42.89%, 10=29.56%, 20=4.60%, 50=3.01%
  lat (msec)   : 100=0.06%, 250=0.03%, 500=0.01%
  cpu          : usr=0.20%, sys=1.82%, ctx=128874, majf=0, minf=144
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=0,127479,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: bw=33.2MiB/s (34.8MB/s), 33.2MiB/s-33.2MiB/s (34.8MB/s-34.8MB/s), io=1992MiB (2089MB), run=60003-60003msec

Disk stats (read/write):
    dm-0: ios=0/127341, merge=0/0, ticks=0/587604, in_queue=613852, util=100.00%, aggrios=0/63915, aggrmerge=0/0, aggrticks=0/294464, aggrin_queue=294738, aggrutil=99.92%
  sdg: ios=0/232, merge=0/0, ticks=0/92, in_queue=92, util=0.03%
  sdf: ios=0/127598, merge=0/1, ticks=0/588836, in_queue=589384, util=99.92%
cat@cat:/mnt/ethdatabase/test$ sudo fio -filename=/mnt/ethdatabase/test/test_randrea3 -direct=1 -iodepth 1 -thread -rw=randrw -rwmixread=70 -ioengine=psync -bs=16k -size=2G -numjobs=10 -runtime=60 -group_reporting -name=mytest -ioscheduler=noop
mytest: (g=0): rw=randrw, bs=(R) 16.0KiB-16.0KiB, (W) 16.0KiB-16.0KiB, (T) 16.0KiB-16.0KiB, ioengine=psync, iodepth=1
...
fio-3.1
Starting 10 threads
mytest: Laying out IO file (1 file / 2048MiB)
fio: io scheduler is not tunable
fio: io scheduler is not tunable
fio: io scheduler is not tunable
fio: io scheduler is not tunable
fio: io scheduler is not tunable
fio: io scheduler is not tunable
fio: io scheduler is not tunable
fio: io scheduler is not tunable
fio: io scheduler is not tunable
fio: io scheduler is not tunable
Jobs: 10 (f=10): [m(10)][100.0%][r=23.7MiB/s,w=9689KiB/s][r=1519,w=605 IOPS][eta 00m:00s]
mytest: (groupid=0, jobs=10): err= 0: pid=7102: Wed Sep 14 11:22:08 2022
   read: IOPS=1394, BW=21.8MiB/s (22.9MB/s)(1308MiB/60003msec)
    clat (usec): min=83, max=296123, avg=5296.99, stdev=4347.98
     lat (usec): min=83, max=296124, avg=5297.50, stdev=4347.98
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    4], 10.00th=[    4], 20.00th=[    4],
     | 30.00th=[    5], 40.00th=[    5], 50.00th=[    5], 60.00th=[    5],
     | 70.00th=[    6], 80.00th=[    7], 90.00th=[    9], 95.00th=[   10],
     | 99.00th=[   16], 99.50th=[   18], 99.90th=[   27], 99.95th=[   68],
     | 99.99th=[  228]
   bw (  KiB/s): min=   64, max= 3142, per=10.00%, avg=2232.38, stdev=434.59, samples=1198
   iops        : min=    4, max=  196, avg=139.47, stdev=27.15, samples=1198
  write: IOPS=602, BW=9644KiB/s (9876kB/s)(565MiB/60003msec)
    clat (usec): min=232, max=228994, avg=4312.24, stdev=3227.87
     lat (usec): min=233, max=228996, avg=4313.42, stdev=3227.87
    clat percentiles (msec):
     |  1.00th=[    3],  5.00th=[    3], 10.00th=[    4], 20.00th=[    4],
     | 30.00th=[    4], 40.00th=[    4], 50.00th=[    4], 60.00th=[    5],
     | 70.00th=[    5], 80.00th=[    5], 90.00th=[    6], 95.00th=[    7],
     | 99.00th=[   13], 99.50th=[   16], 99.90th=[   24], 99.95th=[   68],
     | 99.99th=[  140]
   bw (  KiB/s): min=   32, max= 1603, per=10.00%, avg=964.64, stdev=227.13, samples=1198
   iops        : min=    2, max=  100, avg=60.27, stdev=14.19, samples=1198
  lat (usec)   : 100=0.02%, 250=0.28%, 500=0.04%, 750=0.02%, 1000=0.03%
  lat (msec)   : 2=0.46%, 4=33.04%, 10=62.26%, 20=3.56%, 50=0.22%
  lat (msec)   : 100=0.04%, 250=0.03%, 500=0.01%
  cpu          : usr=0.20%, sys=0.80%, ctx=120904, majf=0, minf=1
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=83699,36167,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=21.8MiB/s (22.9MB/s), 21.8MiB/s-21.8MiB/s (22.9MB/s-22.9MB/s), io=1308MiB (1371MB), run=60003-60003msec
  WRITE: bw=9644KiB/s (9876kB/s), 9644KiB/s-9644KiB/s (9876kB/s-9876kB/s), io=565MiB (593MB), run=60003-60003msec

Disk stats (read/write):
    dm-0: ios=83659/36158, merge=0/0, ticks=440292/153992, in_queue=597572, util=100.00%, aggrios=41849/18087, aggrmerge=0/0, aggrticks=220124/77008, aggrin_queue=297178, aggrutil=99.89%
  sdg: ios=0/4, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
  sdf: ios=83699/36171, merge=0/1, ticks=440248/154016, in_queue=594356, util=99.89%

These are 3 sets of disk performance test data

mutazmq1 commented 2 years ago

Your IOPS is quite low.. you need to increase the IOPS and the BW. It is critical for healing. Upgrading it alone will not solve the problem until you do geth removedb y n first.

litcc commented 2 years ago

ok, before the hard disk using the software way of disk array, resulting in read and write speed under, the subsequent switch to the hardware way disk array, read and write speed is fully sufficient, the current node has been synchronized to complete! Thank you for your replies!