diglos / userpatches

Tool used to create an Ethereum full node based on Armbian distro
GNU General Public License v2.0
51 stars 8 forks source link

Slow sync on NanoPC-T4 with NVMe SSD #16

Open gomes7997 opened 3 years ago

gomes7997 commented 3 years ago

I have a NanoPC T4 with 1TB NVMe SSD which has been syncing for about 6 weeks. I'm running the EthArmbian build available here.

I noticed here that a Raspberry Pi user was able to sync in only about 9 days. Given that this is similar hardware, I'm wondering if anyone can help me troubleshoot my setup. My node has managed to process 773755887 states in 6 weeks:

Nov 01 21:34:40 ethnode-bc89073e geth[16112]: INFO [11-01|21:34:40.721] Downloader queue stats receiptTasks=0 blockTasks=0 itemSize=187.24KiB throttle=351 Nov 01 21:34:42 ethnode-bc89073e geth[16112]: INFO [11-01|21:34:42.779] Imported new block headers count=1 elapsed=34.322ms number=11173619 hash="67cad5…f81212" Nov 01 21:34:45 ethnode-bc89073e geth[16112]: INFO [11-01|21:34:45.971] Imported new state entries count=384 elapsed="15.75µs" processed=773754623 pending=193816 trieretry=0 coderetry=0 duplicate=0 unexpected=393 Nov 01 21:35:00 ethnode-bc89073e geth[16112]: INFO [11-01|21:35:00.800] Imported new state entries count=384 elapsed="15.167µs" processed=773755007 pending=194380 trieretry=0 coderetry=0 duplicate=0 unexpected=393 Nov 01 21:35:05 ethnode-bc89073e geth[16112]: INFO [11-01|21:35:05.286] Imported new block headers count=1 elapsed=446.894ms number=11173620 hash="885043…71fd63" Nov 01 21:35:17 ethnode-bc89073e geth[16112]: INFO [11-01|21:35:17.935] Imported new block headers count=1 elapsed=419.383ms number=11173621 hash="4da9b0…851a0f" Nov 01 21:35:20 ethnode-bc89073e geth[16112]: INFO [11-01|21:35:20.856] Imported new state entries count=496 elapsed="694.163µs" processed=773755503 pending=195065 trieretry=1 coderetry=0 duplicate=0 unexpected=393 Nov 01 21:35:30 ethnode-bc89073e geth[16112]: INFO [11-01|21:35:30.835] Imported new block headers count=1 elapsed=554.866ms number=11173622 hash="a85c36…87c3b3" Nov 01 21:35:33 ethnode-bc89073e geth[16112]: INFO [11-01|21:35:33.590] Imported new block headers count=2 elapsed=39.649ms number=11173624 hash="0e4464…0ca9ed" Nov 01 21:35:36 ethnode-bc89073e geth[16112]: INFO [11-01|21:35:36.124] Imported new state entries count=384 elapsed=3.030ms processed=773755887 pending=195608 trieretry=0 coderetry=0 duplicate=0 unexpected=393

Is there a way to determine whether my sync is slow because of disk or network IO? I have run iostats on the device during sync and get the following for SSD throughput (nvme0n1):

avg-cpu: %user %nice %system %iowait %steal %idle 14.34 0.00 11.51 7.27 0.00 66.87

Device tps kB_read/s kB_wrtn/s kB_read kB_wrtn nvme0n1 2927.80 32683.20 342.40 163416 1712 mmcblk1 0.00 0.00 0.00 0 0 mmcblk1rpmb 0.00 0.00 0.00 0 0 mmcblk1boot1 0.00 0.00 0.00 0 0 mmcblk1boot0 0.00 0.00 0.00 0 0 mmcblk0 65.80 1172.00 2.40 5860 12 zram0 0.00 0.00 0.00 0 0 zram1 318.60 1273.60 0.80 6368 4

I have run most of the sync without port forwarding enabled, but one week ago I enabled port forwarding to the device for UDP & TCP port 30303.

Can anyone suggest monitoring or other tests I can perform to understand where the bottleneck is?

diglos commented 3 years ago

Ummh, this is really odd. The device should have been switched to Full sync before reaching 700M State entries.

Which Geth version are you running? Can you try our last image? (from 2 weeks ago, see Readme on this Repo).

gomes7997 commented 3 years ago

Thanks for the reply. I upgraded to Geth v.1.9.23 two days ago by running 'update-ethereum', but I did not otherwise modify the image on my boot SD card. I did not clear the DB on the SSD drive and therefore most of it was built using an older version of Geth running for several weeks. I don't remember which version this was, but I was using the image from this repo downloaded around 09/19/2020.

After the update I haven't noticed much difference in behavior, the number of processed state entries appears to be increasing at about the same pace as before. Do you recommend wiping the DB and rebooting from an SD card flashed with the latest image? I am a little worried starting from scratch will set me back several weeks of waiting time.

diglos commented 3 years ago

Yes, please, do a resync from scratch with the new image. The sync should take no longer than 9-10 days.

I just started a Geth sync on my nano as well to check if I can reproduce this.

gomes7997 commented 3 years ago

After restarting from scratch with the latest image, my device has now run for about 10 days. Currently it's at 402M processed states and still syncing. Was your Nano able to sync? If so I'm wondering if this is a network issue or slow SSD on my end. The state processing was very fast up to about 250M states and then there was a dramatic slowdown in processing rate. My peer count (from Geth Javascript console) shows consistently around 99 connected peers.

3cl1ps commented 3 years ago

hey, experiencing slow sync on nanopc too, like +20 days, when it come to importing states entries. i just keep an export on the lan network.