Global critical problem. Slow sync!

TehnobitSystems commented 2 years ago

Hi everyone! Full BSC node cannot fully sync. Never. The same problem is affecting many guys here. Many posts have already been created on github and other sources, but there is no solution can be find anywhere. Many recommendations do not help solve the problem. This is already a global problem, but the developers are not responding. How to get a reaction from the BSC engineering team? If you know someone from the BSC team or other guys who can help, let's mention the engineers here so that they would see this.

AwesomeMylaugh commented 2 years ago

+1, I have the same problem and try every way i can find,still dont work!!!!! developers help us asap plz!!!!

elnem0 commented 2 years ago

Confirming. Same issue. Tried snapshot, from genesis, all syncmodes, all tips from threads on github.

civa commented 2 years ago

Same here.

FeurJak commented 2 years ago

Same.

FeurJak commented 2 years ago

Running i3en.6xlarge on AWS, node gets out of sync too frequently.

holiman commented 2 years ago

@TehnobitSystems out of the devs that you pinged, not a single one has anything to do with BSC. We are/were all developers of the ethereum Geth client. BSC has reused the codebase, and still imports commits from the upstream project go-ethereum, but that does not mean that we are affiliated with BSC. Please close this to avoid spam for the drive-by mentioned ethereum-devs, and possibly open a new one where you instead ping the BSC-devs.

xpkore commented 2 years ago

Running i3en.6xlarge on AWS, node gets out of sync too frequently.

INFO [11-15|22:17:49.357] Imported new chain segment               blocks=1  txs=410   mgas=58.632   elapsed=215.999ms   mgasps=271.446 number=12,667,979 hash=a6765e..d83fee dirty=1.54GiB
INFO [11-15|22:17:51.562] Imported new chain segment               blocks=1  txs=430   mgas=70.805   elapsed=570.162ms   mgasps=124.184 number=12,667,980 hash=98425b..098070 dirty=1.55GiB
INFO [11-15|22:17:51.563] Unindexed transactions                   blocks=1  txs=144   tail=10,317,981 elapsed="936.038µs"
INFO [11-15|22:17:52.737] Imported new chain segment               blocks=1  txs=432   mgas=72.417   elapsed=214.234ms   mgasps=338.028 number=12,667,980 hash=95a0e9..eb2341 dirty=1.55GiB
INFO [11-15|22:17:55.723] Imported new chain segment               blocks=1  txs=501   mgas=84.467   elapsed=717.816ms   mgasps=117.672 number=12,667,981 hash=14c271..330563 dirty=1.56GiB
INFO [11-15|22:17:55.725] Unindexed transactions                   blocks=1  txs=155   tail=10,317,982 elapsed=1.860ms
INFO [11-15|22:17:55.990] Imported new chain segment               blocks=1  txs=460   mgas=79.286   elapsed=267.143ms   mgasps=296.790 number=12,667,981 hash=a5089d..fbbafa dirty=1.55GiB
INFO [11-15|22:17:58.867] Imported new chain segment               blocks=1  txs=468   mgas=68.842   elapsed=684.311ms   mgasps=100.601 number=12,667,982 hash=3fa955..c8a142 dirty=1.56GiB
INFO [11-15|22:17:58.868] Unindexed transactions                   blocks=1  txs=166   tail=10,317,983 elapsed=1.137ms
INFO [11-15|22:18:00.393] Imported new chain segment               blocks=1  txs=579   mgas=62.841   elapsed=550.767ms   mgasps=114.098 number=12,667,983 hash=ac9ccc..a8f940 dirty=1.55GiB
INFO [11-15|22:18:00.393] Unindexed transactions                   blocks=1  txs=82    tail=10,317,984 elapsed="586.839µs"
INFO [11-15|22:18:01.860] Imported new chain segment               blocks=1  txs=573   mgas=60.998   elapsed=212.207ms   mgasps=287.446 number=12,667,983 hash=ea4f37..4fb347 dirty=1.55GiB
INFO [11-15|22:18:04.175] Imported new chain segment               blocks=1  txs=458   mgas=64.906   elapsed=693.264ms   mgasps=93.623  number=12,667,984 hash=1fe456..6579ae dirty=1.56GiB
INFO [11-15|22:18:04.176] Unindexed transactions                   blocks=1  txs=166   tail=10,317,985 elapsed=1.310ms
INFO [11-15|22:18:06.401] Imported new chain segment               blocks=1  txs=381   mgas=48.030   elapsed=498.179ms   mgasps=96.411  number=12,667,985 hash=df6df2..a6380e dirty=1.56GiB
INFO [11-15|22:18:06.402] Unindexed transactions                   blocks=1  txs=85    tail=10,317,986 elapsed="773.789µs"
INFO [11-15|22:18:09.775] Imported new chain segment               blocks=1  txs=488   mgas=69.494   elapsed=645.502ms   mgasps=107.658 number=12,667,986 hash=e71a88..3486d1 dirty=1.56GiB
INFO [11-15|22:18:09.776] Unindexed transactions                   blocks=1  txs=131   tail=10,317,987 elapsed=1.114ms
INFO [11-15|22:18:10.191] Deep froze chain segment                 blocks=20 elapsed=55.777ms    number=12,577,986 hash=294251..212829
INFO [11-15|22:18:14.534] Imported new chain segment               blocks=1  txs=544   mgas=70.510   elapsed=1.812s      mgasps=38.906  number=12,667,987 hash=056cb6..50cd92 dirty=1.56GiB
INFO [11-15|22:18:14.542] Unindexed transactions                   blocks=1  txs=123   tail=10,317,988 elapsed=3.547ms
INFO [11-15|22:18:15.230] Imported new chain segment               blocks=1  txs=408   mgas=57.024   elapsed=694.916ms   mgasps=82.059  number=12,667,988 hash=09059d..942ace dirty=1.56GiB
INFO [11-15|22:18:15.230] Unindexed transactions                   blocks=1  txs=117   tail=10,317,989 elapsed="772.14µs"
INFO [11-15|22:18:16.907] Imported new chain segment               blocks=1  txs=439   mgas=58.774   elapsed=527.387ms   mgasps=111.444 number=12,667,989 hash=16005e..20009e dirty=1.56GiB
INFO [11-15|22:18:16.909] Unindexed transactions                   blocks=1  txs=194   tail=10,317,990 elapsed=1.286ms

i3en.2xlarge USEAST Synced from the 11/11 snapshot Looks okay but slower than I remember from a few weeks ago 30 peers

izidorit commented 2 years ago

Running i3en.6xlarge on AWS, node gets out of sync too frequently.

INFO [11-15|22:17:49.357] Imported new chain segment               blocks=1  txs=410   mgas=58.632   elapsed=215.999ms   mgasps=271.446 number=12,667,979 hash=a6765e..d83fee dirty=1.54GiB
INFO [11-15|22:17:51.562] Imported new chain segment               blocks=1  txs=430   mgas=70.805   elapsed=570.162ms   mgasps=124.184 number=12,667,980 hash=98425b..098070 dirty=1.55GiB
INFO [11-15|22:17:51.563] Unindexed transactions                   blocks=1  txs=144   tail=10,317,981 elapsed="936.038µs"
INFO [11-15|22:17:52.737] Imported new chain segment               blocks=1  txs=432   mgas=72.417   elapsed=214.234ms   mgasps=338.028 number=12,667,980 hash=95a0e9..eb2341 dirty=1.55GiB
INFO [11-15|22:17:55.723] Imported new chain segment               blocks=1  txs=501   mgas=84.467   elapsed=717.816ms   mgasps=117.672 number=12,667,981 hash=14c271..330563 dirty=1.56GiB
INFO [11-15|22:17:55.725] Unindexed transactions                   blocks=1  txs=155   tail=10,317,982 elapsed=1.860ms
INFO [11-15|22:17:55.990] Imported new chain segment               blocks=1  txs=460   mgas=79.286   elapsed=267.143ms   mgasps=296.790 number=12,667,981 hash=a5089d..fbbafa dirty=1.55GiB
INFO [11-15|22:17:58.867] Imported new chain segment               blocks=1  txs=468   mgas=68.842   elapsed=684.311ms   mgasps=100.601 number=12,667,982 hash=3fa955..c8a142 dirty=1.56GiB
INFO [11-15|22:17:58.868] Unindexed transactions                   blocks=1  txs=166   tail=10,317,983 elapsed=1.137ms
INFO [11-15|22:18:00.393] Imported new chain segment               blocks=1  txs=579   mgas=62.841   elapsed=550.767ms   mgasps=114.098 number=12,667,983 hash=ac9ccc..a8f940 dirty=1.55GiB
INFO [11-15|22:18:00.393] Unindexed transactions                   blocks=1  txs=82    tail=10,317,984 elapsed="586.839µs"
INFO [11-15|22:18:01.860] Imported new chain segment               blocks=1  txs=573   mgas=60.998   elapsed=212.207ms   mgasps=287.446 number=12,667,983 hash=ea4f37..4fb347 dirty=1.55GiB
INFO [11-15|22:18:04.175] Imported new chain segment               blocks=1  txs=458   mgas=64.906   elapsed=693.264ms   mgasps=93.623  number=12,667,984 hash=1fe456..6579ae dirty=1.56GiB
INFO [11-15|22:18:04.176] Unindexed transactions                   blocks=1  txs=166   tail=10,317,985 elapsed=1.310ms
INFO [11-15|22:18:06.401] Imported new chain segment               blocks=1  txs=381   mgas=48.030   elapsed=498.179ms   mgasps=96.411  number=12,667,985 hash=df6df2..a6380e dirty=1.56GiB
INFO [11-15|22:18:06.402] Unindexed transactions                   blocks=1  txs=85    tail=10,317,986 elapsed="773.789µs"
INFO [11-15|22:18:09.775] Imported new chain segment               blocks=1  txs=488   mgas=69.494   elapsed=645.502ms   mgasps=107.658 number=12,667,986 hash=e71a88..3486d1 dirty=1.56GiB
INFO [11-15|22:18:09.776] Unindexed transactions                   blocks=1  txs=131   tail=10,317,987 elapsed=1.114ms
INFO [11-15|22:18:10.191] Deep froze chain segment                 blocks=20 elapsed=55.777ms    number=12,577,986 hash=294251..212829
INFO [11-15|22:18:14.534] Imported new chain segment               blocks=1  txs=544   mgas=70.510   elapsed=1.812s      mgasps=38.906  number=12,667,987 hash=056cb6..50cd92 dirty=1.56GiB
INFO [11-15|22:18:14.542] Unindexed transactions                   blocks=1  txs=123   tail=10,317,988 elapsed=3.547ms
INFO [11-15|22:18:15.230] Imported new chain segment               blocks=1  txs=408   mgas=57.024   elapsed=694.916ms   mgasps=82.059  number=12,667,988 hash=09059d..942ace dirty=1.56GiB
INFO [11-15|22:18:15.230] Unindexed transactions                   blocks=1  txs=117   tail=10,317,989 elapsed="772.14µs"
INFO [11-15|22:18:16.907] Imported new chain segment               blocks=1  txs=439   mgas=58.774   elapsed=527.387ms   mgasps=111.444 number=12,667,989 hash=16005e..20009e dirty=1.56GiB
INFO [11-15|22:18:16.909] Unindexed transactions                   blocks=1  txs=194   tail=10,317,990 elapsed=1.286ms

i3en.2xlarge USEAST Synced from the 11/11 snapshot Looks okay but slower than I remember from a few weeks ago 30 peers

Where and what cache size did you set for reaching "dirty=1.56GiB"? I use:

[Eth]
DatabaseCache = 102400

But the used cache size is: Imported new chain segment blocks=1 txs=470 mgas=76.762 elapsed=1.188s mgasps=64.594 number=12,668,158 hash=f75603..8df542 dirty=1.01GiB

kwkr commented 2 years ago

@philzeh could you share you config? .toml + the command that you use to run the node? Also did you do anything special when it comes to setting up the EC2? I tried the setup that you are mentioning but couldn't sync anyway.

chevoisiatesalvati commented 2 years ago

Hi guys, I'm trying to get a fullnode sync from days without success. First I tried from snapshot (using Virtualbox with 12 core, 35 gb ram, SSD NVME Samsung 980 PRO), but it went slower than the blockchain itself, so I deleted everything. Then I tried running it from scratch and it downloaded all the blocks pretty quickly (like around a day) but when it did, it started to "import new entries" like forever (I got more than 500 M of knownstates). So I decided to do everything again using VMware (I also had some trouble with Virtualbox that's why), from a new snapshot (11/11/2021), but again it's too slow. This time it seems to be faster than the blockchain itself, but I get 1 minute in about 20 seconds. So again, to download 4 days of blockchain, I need almost 2 days at this speed. It's seems too strange to me, isn't it?

hbtj123 commented 2 years ago

same

botfi-finance commented 2 years ago

same prob.

voron commented 2 years ago

Let me describe our expirience how to get synced node w/o issues. Key point is:

Use low-latency disks(or terabytes of RAM for cache), 1.5TB+. This is really important. Generic cloud SSD will not do the job. See below for details.

Generic requirements:

Use at least 8 CPU cores, 16 may suit better, especially when you need to serve RPC besides node sync up
Use at least 32GB of RAM, 48-64GB may suit a bit better
Use latest bsc, (v1.1.4 at the time of writting), pay attention to following settings:
- enable diffsync via --diffsync CLI argument and
```
[Eth]
...
DisablePeerTxBroadcast = true
```
  It will speed up syncing during last ~1h40m lag only (2048 blocks) in bsc v1.1.4. Larger syncing lag is not affected by diffsync.
- disable snapshot --snapshot=false, as snapshot may kill IO performance, while it's a key for a synced node
- You may (or may not) need 300-500 peers to keep your node synced up during BSC hard times, when there is a lof of stale peers
```
[Node.P2P]
MaxPeers = 300
```
  A lot of peers requires significantly more CPU and more network bandwidth, watch your cloud bill
Use latest pruned snapshot as bootstrap https://github.com/binance-chain/bsc-snapshots, sync from scratch is almost imposible these days on generic HW
Expose udp p2p port, ensure it's opened in the firewall. It may help to discover more peers.

Sync up speed is around 2x compared to generation, thus if your node lags 10 hours, you'll need around 10 hours to sync up, assuming your p2p peers are fine all the time.

Disks details:

BSC, as a blockchain node, uses storage in a sequencial fashion in [almost] single thread during sync. It means f.e. to get 10k IOPS in single thread, we need at most 0.1ms disk latency ( 1second / 10000 IOPS = 0.1ms ) . Most cloud disks (aws EBS f.e.) are network disks actually, while similar low latencies cannot be achieved via network usually as of today. Hundreds of thousands of cloud SSD IOPS will not help here, as it may be utilized with a bunch of IO threads only.
I would say 0.1ms disk latency is a good start, less is better. You may use fio with iodepth=1 to measure it.
We use local ephemeral SSDs with GCP (plus RAID/LVM to get required capacity, not to increase speed). Local ephemeral NVMe disks at AWS should work fine too. io2 Block Express from AWS may work, they declare Latency: sub-millisecond, but we didn't tested it. We tried GCP extreme disks also, but have to switch to local ephemeral SSDs.
Bare metal servers with server-grade NVME disks should work too, intel optane will suit best here due to lowest latency, but optane is not a hard requirement for sure. SATA SSD may work sometimes too, check latency.

FeurJak commented 2 years ago

Let me describe our expirience how to get synced node w/o issues. Key point is:

Use low-latency disks(or terabytes of RAM for cache), 1.5TB+. This is really important. Generic cloud SSD will not do the job. See below for details.

Generic requirements:
Use at least 8 CPU cores, 16 may suit better, especially when you need to serve RPC besides node sync up

Use at least 32GB of RAM, 48-64GB may suit a bit better
Use latest bsc, (1.1.4 at the time of writting), pay attention to following settings:

enable diffsync via --diffsync CLI argument and
[Eth]
...
DisablePeerTxBroadcast = true
disable snapshot --snapshot=false, as snapshot may kill IO performance, while it's a key for a synced node

You may (or may not) need 300-500 peers to keep your node synced up during BSC hard times, when there is a lof of stale peers
[Node.P2P]
MaxPeers = 300
A lot of peers requires significantly more CPU and more network bandwidth, watch your cloud bill
Use latest pruned snapshot as bootstrap https://github.com/binance-chain/bsc-snapshots, sync from scratch is almost imposible these days on generic HW
Sync up speed is around 2x compared to generation, thus if your node lags 10 hours, you'll need around 10 hours to sync up, assuming your p2p peers are fine all the time.

Disks details:

BSC, as a blockchain node, uses storage in a sequencial fashion in [almost] single thread during sync. It means f.e. to get 10k IOPS in single thread, we need at most 0.1ms disk latency ( 1second / 10000 IOPS = 0.1ms ) . Most cloud disks (aws EBS f.e.) are network disks actually, while similar low latencies cannot be achieved via network usually as of today. Hundreds of thousands of cloud SSD IOPS will not help here, as it may be utilized with a bunch of IO threads only.

I would say 0.1ms disk latency is a good start, less is better. You may use fio with iodepth=1 to measure it.

We use local SSDs with GCP (plus RAID/LVM to get required capacity, not to increase speed). io2 Block Express from AWS may work, they declare Latency: sub-millisecond, but we didn't tested it. We tried GCP extreme disks also, but have to switch to local SSD.

Bare metal servers with server-grade NVME disks should work too, intel optane will suit best here due to lowest latency, but optane is not a hard requirement for sure. SATA SSD may work sometimes too, check latency.

Thanks for that ! What's the instance type that you are using for GCP ? I might migrate over to GCP from AWS. AWS i3en.6xlarge instance deoesn't seem to cut it with the aws EBS storage in terms of IOPS.

voron commented 2 years ago

@FeurJak Pick up any [8+cpu 32GB+ RAM] instance on GCP or AWS with ephemeral local NVME disks and place BSC datadir on local NVME. You may attach additional disks to the instance to perform backup, as ephemeral NVMEs are ephemeral :). We use n2-standard-16 on GCP.

AWS i3en.6xlarge instance deoesn't seem to cut it with the aws EBS storage in terms of IOPS.

You should use instance storage there, not an aws EBS one. Pay attention to backups when required, as ephemeral storage is not what you wanna use for f.e. crypto wallets.

izidorit commented 2 years ago

@FeurJak how can I verify how much cache do I use?

FeurJak commented 2 years ago

@FeurJak how can I verify how much cache do I use?

Not too sure how you can verify it, but I set the cache size on my command:

./build/bin/geth --config ./config.toml --datadir ./node --rpc.allow-unprotected-txs --txlookuplimit 0 --http.api web3,eth,miner,net,txpool,debug --rpc --rpcaddr 0.0.0.0 --rpcport 8545 --rpcapi web3,eth,personal,miner,net,txpool,debug console --ipcpath geth.ipc --syncmode fast --gcmode full --snapshot false --cache.preimages --cache 128000 --diffsync

FeurJak commented 2 years ago

@FeurJak Pick up any [8+cpu 32GB+ RAM] instance on GCP or AWS with ephemeral local NVME disks and place BSC datadir on local NVME. You may attach additional disks to the instance to perform backup, as ephemeral NVMEs are ephemeral :). We use n2-standard-16 on GCP.

AWS i3en.6xlarge instance deoesn't seem to cut it with the aws EBS storage in terms of IOPS.

You should use instance storage there, not an aws EBS one. Pay attention to backups when required, as ephemeral storage is not what you wanna use for f.e. crypto wallets.

Yea sorry, just realised my AWS i3en instance does use ephemereal... anyhow giving GCP a go, with N2D Compute optimized 64 Core 64 GB Memory with 9TB local SSD.

chevoisiatesalvati commented 2 years ago

Let me describe our expirience how to get synced node w/o issues. Key point is:

Use low-latency disks(or terabytes of RAM for cache), 1.5TB+. This is really important. Generic cloud SSD will not do the job. See below for details.

Generic requirements:
Use at least 8 CPU cores, 16 may suit better, especially when you need to serve RPC besides node sync up

Use at least 32GB of RAM, 48-64GB may suit a bit better
Use latest bsc, (1.1.4 at the time of writting), pay attention to following settings:

enable diffsync via --diffsync CLI argument and
[Eth]
...
DisablePeerTxBroadcast = true
disable snapshot --snapshot=false, as snapshot may kill IO performance, while it's a key for a synced node

You may (or may not) need 300-500 peers to keep your node synced up during BSC hard times, when there is a lof of stale peers
[Node.P2P]
MaxPeers = 300
A lot of peers requires significantly more CPU and more network bandwidth, watch your cloud bill
Use latest pruned snapshot as bootstrap https://github.com/binance-chain/bsc-snapshots, sync from scratch is almost imposible these days on generic HW
Sync up speed is around 2x compared to generation, thus if your node lags 10 hours, you'll need around 10 hours to sync up, assuming your p2p peers are fine all the time.

Disks details:

BSC, as a blockchain node, uses storage in a sequencial fashion in [almost] single thread during sync. It means f.e. to get 10k IOPS in single thread, we need at most 0.1ms disk latency ( 1second / 10000 IOPS = 0.1ms ) . Most cloud disks (aws EBS f.e.) are network disks actually, while similar low latencies cannot be achieved via network usually as of today. Hundreds of thousands of cloud SSD IOPS will not help here, as it may be utilized with a bunch of IO threads only.

I would say 0.1ms disk latency is a good start, less is better. You may use fio with iodepth=1 to measure it.

We use local ephemeral SSDs with GCP (plus RAID/LVM to get required capacity, not to increase speed). Local ephemeral NVMe disks at AWS should work fine too. io2 Block Express from AWS may work, they declare Latency: sub-millisecond, but we didn't tested it. We tried GCP extreme disks also, but have to switch to local ephemeral SSDs.

Bare metal servers with server-grade NVME disks should work too, intel optane will suit best here due to lowest latency, but optane is not a hard requirement for sure. SATA SSD may work sometimes too, check latency.

I was syncing slowly, recovering like 1 minute blockchain time in 20 seconds real time (3x blockchain time speed). Then I read your post, thinking that I was going so slow for those settings you mentioned. Stopped the node, changed setting as you said, restarted the node. Now I'm getting blocks at blockchain time, being stuck at 2d21h50m behind. LOL What now? I don't know if roll back to the settings I had before, since the problem could be peers...maybe? I'm using a VM with ubuntu, 12 core, 35 gb ram, ssd NVME samsung 980 PRO.

berktaylan commented 2 years ago

Anyone knows how many state entries total currently ? lol

xpkore commented 2 years ago

Anyone knows how many state entries total currently ? lol

eth.syncing
{
  currentBlock: 12681283,
  highestBlock: 12681284,
  knownStates: 297473485,
  pulledStates: 297473485,
  startingBlock: 12681281
}

berktaylan commented 2 years ago

Anyone knows how many state entries total currently ? lol

eth.syncing
{
  currentBlock: 12681283,
  highestBlock: 12681284,
  knownStates: 297473485,
  pulledStates: 297473485,
  startingBlock: 12681281
}

How it can be possible { currentBlock: 12681302, highestBlock: 12681387, knownStates: 706039145, pulledStates: 705885395, startingBlock: 12679086 }

voron commented 2 years ago

@chevoisiatesalvati

Now I'm getting blocks at blockchain time, being stuck at 2d21h50m behind. LOL

diffsync may improve sync performance starting from ~1h40m lag and less. 2 days is too large for diffsync to provide sync speed boost. Thus I don't think that diffsync-related settings change affects you.

What now? I don't know if roll back to the settings I had before, since the problem could be peers...maybe?

It's p2p, you cannot just force it to sync asap. Exposed p2p udp port with external IP may speed up peers discovery.

xpkore commented 2 years ago

Anyone knows how many state entries total currently ? lol
eth.syncing
{
  currentBlock: 12681283,
  highestBlock: 12681284,
  knownStates: 297473485,
  pulledStates: 297473485,
  startingBlock: 12681281
}
How it can be possible { currentBlock: 12681302, highestBlock: 12681387, knownStates: 706039145, pulledStates: 705885395, startingBlock: 12679086 }

My node is run from snapshot which is pruned, so maybe it's different

billyadelphia commented 2 years ago

Maybe, just maybe, at the current state BSC network require more capable hardware to sync properly. The minimum requirement won't work anymore. Because I have AMD Ryzen 9 3900 (12 Core CPU), 128 GB DDR4 ECC Memory, 2 x 1.92 TB SSD with RAID 0, 1Gbps internet, and never had any issue with syncing, always fully synced.

NullQubit commented 2 years ago

Maybe, just maybe, at the current state BSC network require more capable hardware to sync properly. The minimum requirement won't work anymore. Because I have AMD Ryzen 9 3900 (12 Core CPU), 128 GB DDR4 ECC Memory, 2 x 1.92 TB SSD with RAID 0, 1Gbps internet, and never had any issue with syncing, always fully synced.

People have syncing problems on much (MUCH) more powerful hardware than what you have. I'm using the most powerful hardware available on Azure that costs thousands, and I'm lagging behind. Popular RPC providers are lagging behind. The Azure team performed a performance diagnosis and assured me that the hardware is not the bottleneck. Hardware is not the issue, or if it is, then it's much more complicated than "require more capable hardware".

billyadelphia commented 2 years ago

People have syncing problems on much (MUCH) more powerful hardware than what you have. I'm using the most powerful hardware available on Azure that costs thousands, and I'm lagging behind. Popular RPC providers are lagging behind. The Azure team performed a performance diagnosis and assured me that the hardware is not the bottleneck. Hardware is not the issue, or if it is, then it's much more complicated than "require more capable hardware".

Wow, that's crazy ! 3 months ago I'm just rent a server . I Download the snapshot and start syncing, it's working fine ever since. I also disable the entire firewall to make sure all connections aren't blocked (Since I'm bad at configuring firewall, haha).

NullQubit commented 2 years ago

People have syncing problems on much (MUCH) more powerful hardware than what you have. I'm using the most powerful hardware available on Azure that costs thousands, and I'm lagging behind. Popular RPC providers are lagging behind. The Azure team performed a performance diagnosis and assured me that the hardware is not the bottleneck. Hardware is not the issue, or if it is, then it's much more complicated than "require more capable hardware".

Wow, that's crazy ! 3 months ago I'm just rent a hetzner server here https://www.hetzner.com/dedicated-rootserver/ax61-nvme , cost 84 Euro per month . I Download the snapshot and start syncing, it's working fine until today. I also disable the entire firewall to make sure all connections aren't blocked (Since I'm bad at configuring firewall, haha).

The only ports I have opened are 30311 for P2P and 8545/8546 for RPC. Do you know if any other port is required?

billyadelphia commented 2 years ago

The only ports I have opened are 30311 for P2P and 8545/8546 for RPC. Do you know if any other port is required?

That's why I disable the firewall since I don't know the ports, and I'm super lazy to checking it.

uniftyitadmin commented 2 years ago

Hello. I am having the same problem. MY HW is Hetzner Dedicated Server (8 Cores AMD Ryzen 3700X, 48GB RAM,SSD disks,ext4 fs). There is no performance peaks using this HW since there was already BSC node on the same server with literally same settings. i just reinstalled it (full disk problem). Its already syncing for a 8-10 days. I am getting this output on eth.isSyncing: { currentBlock: 12686414, highestBlock: 12687209, knownStates: 1451780580, pulledStates: 1451770187, startingBlock: 12684550 }

Just added --diffsync (and raised max.peers from 250 to 300) so will wait to see if that will solve the problem.

Also, I/O performance is good i think. t=2021-11-16T04:54:37+0100 lvl=info msg="Imported new state entries" count=384 elapsed="40.246µs" processed=1,451,152,720 pending=75350 trieretry=0 coderetry=0 duplicate=0 unexpected=0 t=2021-11-16T04:54:38+0100 lvl=info msg="Imported new block headers" count=1 elapsed="331.456µs" number=12,686,820 hash=0x12b55c1b1b8c3811cd6107153318666c066b1d8b6d49d09111f63d663cb357a6 t=2021-11-16T04:54:38+0100 lvl=info msg="Imported new state entries" count=258 elapsed="3.026µs" processed=1,451,152,978 pending=75628 trieretry=135 coderetry=0 duplicate=0 unexpected=0

voron commented 2 years ago

@NullQubit

People have syncing problems on much (MUCH) more powerful hardware than what you have. I'm using the most powerful hardware available on Azure that costs thousands, and I'm lagging behind

Are you using local/temporary SSD as BSC datadir ? For example, Dadsv5 series in Azure with 1800GB+ temp storage. Ue any Azure VM to meet minimum requirements and to get temporary SSD and use it as BSC datadir. Pay attention on backups, as it's temporary SSD.

ib0b commented 2 years ago

Anyone knows how many state entries total currently ? lol

As of this week , there should be about 1.5Billion from what I have seen

uniftyitadmin commented 2 years ago

New release (1.1.5) is here. Diffsync option is improved in speed and stability. Maybe this upgrade will solve our problem. https://github.com/binance-chain/bsc/releases/tag/v1.1.5

voron commented 2 years ago

Maybe this upgrade will solve our problem.

Just checked the code - diffsync is still applied to lag less than 2048 blocks only, ~1h40m.

izidorit commented 2 years ago

1.1.5 seems a big improvement for me. It does not fell out of sync.

JohnsonCaii commented 2 years ago

Same here, even with diffsyncoption

uniftyitadmin commented 2 years ago

Still same for me too.

xpkore commented 2 years ago

ax61-nvme syncs good thanks @billyadelphia

ib0b commented 2 years ago

ax61-nvme syncs good thanks @billyadelphia

Have you synced yet? Did you get lucky and get a gen4 NMVE drive, you can test with

lsblk -o NAME,FSTYPE,LABEL,MOUNTPOINT,SIZE,MODEL

the gen 3 has the following model SAMSUNG MZQLB1T9HAJR-00007

xpkore commented 2 years ago

nvme0n1 1.8T SAMSUNG MZQL21T9HCJR-00A07

feelsgoodman

ib0b commented 2 years ago

you got lucky 😂😂

hbtj123 commented 2 years ago

Sync is good until the first hyped launch comes and the blocks not able to handle couple of thousand tx`s. We are still facing the issues no matter how we are installing or configuring the nodes.

xpkore commented 2 years ago

Sync is good until the first hyped launch comes and the blocks not able to handle couple of thousand tx`s. We are still facing the issues no matter how we are installing or configuring the nodes.

They never could handle thousands of tx. The blocks fill up at around 600 tx usually. It would just be a bunch of full blocks for a while, which we currently get anyway.

DryDragon10 commented 2 years ago

Maybe, just maybe, at the current state BSC network require more capable hardware to sync properly. The minimum requirement won't work anymore. Because I have AMD Ryzen 9 3900 (12 Core CPU), 128 GB DDR4 ECC Memory, 2 x 1.92 TB SSD with RAID 0, 1Gbps internet, and never had any issue with syncing, always fully synced.

It's not about the hardware. I have exactly the same server from the same provider, and my node can't sync. You think you always fully synced? Try to rebuild your system and see if you could sync it anymore

NullQubit commented 2 years ago

@NullQubit

People have syncing problems on much (MUCH) more powerful hardware than what you have. I'm using the most powerful hardware available on Azure that costs thousands, and I'm lagging behind

Are you using local/temporary SSD as BSC datadir ? For example, Dadsv5 series in Azure with 1800GB+ temp storage. Ue any Azure VM to meet minimum requirements and to get temporary SSD and use it as BSC datadir. Pay attention on backups, as it's temporary SSD.

I've tried both using D96ds_v5 and using the temp storage disk for my node, and an attached ultra disk LRS (4072 GiB size, 160k IOPS and 4000 MB/s max throughput), in two different locations (US central and France central).

ib0b commented 2 years ago

i finally synced seemed the two biggest bottlenecks are CPU and IOPS (Actual sync time about 20hrs) Hardware Used :

Hetzner AX61 (128 gb ram AMD Ryzen™ 9 3900 12-Core )
2 x 1.92 TB raid 0
Got unlucky with gen3 nvme but still synced

Process:

download geth 1.1.5 and make it executable , optionally move it /usr/local/bin/geth
download mainnet.zip and unzip
generate genesis using command below, will also create a mainnet folder for blockchain data ./geth_linux --datadir mainnet init genesis.json
download the 14 nov 2021 snapshot
extract snapshot

move snapshot data to mainnet folder

rm -rf mainnet/geth/chaindata
rm -rf mainnet/geth/triecache
mv server/data-seed/geth/chaindata mainnet/geth/chaindata
mv server/data-seed/geth/triecache mainnet/geth/triecache

Actual sync process:

[Optional] Open config.toml and delete the Node Log section, just useful for getting logs straight on the terminal or just use tail to look at the logs

[Optional] Create a service or use screen to run the command below, so it doesn't stop if you are using SSH, I used screen

Run screen then press enter, (anytime you lose connection via ssh, run screen -r to get back the "screen/terminal" where geth was running

Geth Command

geth --config ./config.toml --datadir ./mainnet --cache 100000 --rpc.allow-unprotected-txs --txlookuplimit 0 --http --maxpeers 100 --ws --syncmode=full --snapshot=false --diffsync

Might be very Important

keep maxpeers at around 100
syncmode full, I used fast in other servers, never synced
snapshot false, means you won't be providing snapshots to other people, you can change it to true once you have synced and have a good server
diffsync not sure if this helped, but probably did

chevoisiatesalvati commented 2 years ago

i finally synced seemed the two biggest bottlenecks are CPU and IOPS (Actual sync time about 20hrs) Hardware Used :

Hetzner AX61 (128 gb ram AMD Ryzen™ 9 3900 12-Core )

2 x 1.92 TB raid 0

Got unlucky with gen3 nvme but still synced

Process:

download geth 1.1.5 and make it executable , optionally move it /usr/local/bin/geth

download mainnet.zip and unzip

generate genesis using command below, will also create a mainnet folder for blockchain data ./geth_linux --datadir mainnet init genesis.json

download the 14 nov 2021 snapshot

extract snapshot

move snapshot data to mainnet folder
rm -rf mainnet/geth/chaindata
rm -rf mainnet/geth/triecache
mv server/data-seed/geth/chaindata mainnet/geth/chaindata
mv server/data-seed/geth/triecache mainnet/geth/triecache
Actual sync process:

[Optional] Open config.toml and delete the Node Log section, just useful for getting logs straight on the terminal or just use tail to look at the logs

[Optional] Create a service or use screen to run the command below, so it doesn't stop if you are using SSH, I used screen

Run screen then press enter, (anytime you lose connection via ssh, run screen -r to get back the "screen/terminal" where geth was running

Geth Command

geth --config ./config.toml --datadir ./mainnet --cache 100000 --rpc.allow-unprotected-txs --txlookuplimit 0 --http --maxpeers 100 --ws --syncmode=full --snapshot=false --diffsync

Might be very Important

keep maxpeers at around 100

syncmode full, I used fast in other servers, never synced

snapshot false, means you won't be providing snapshots to other people, you can change it to true once you have synced and have a good server

diffsync not sure if this helped, but probably did

Following instructions literally, except for cache size since I only have 32 GB RAM. But I have better ssd (WD Black SN850 in RAID0), I'm still going slow as always. I got 2 hours in 5. I'm behind of about 2 days and half, so at this speed I would need about 5-6 days to sync lol How could you have done it in 20 hrs? Is it the RAM? I don't know...

ib0b commented 2 years ago

i finally synced seemed the two biggest bottlenecks are CPU and IOPS (Actual sync time about 20hrs) Hardware Used :

Hetzner AX61 (128 gb ram AMD Ryzen™ 9 3900 12-Core )

2 x 1.92 TB raid 0

Got unlucky with gen3 nvme but still synced

Process:

download geth 1.1.5 and make it executable , optionally move it /usr/local/bin/geth

download mainnet.zip and unzip

generate genesis using command below, will also create a mainnet folder for blockchain data ./geth_linux --datadir mainnet init genesis.json

download the 14 nov 2021 snapshot

extract snapshot

move snapshot data to mainnet folder
rm -rf mainnet/geth/chaindata
rm -rf mainnet/geth/triecache
mv server/data-seed/geth/chaindata mainnet/geth/chaindata
mv server/data-seed/geth/triecache mainnet/geth/triecache
Actual sync process:

[Optional] Open config.toml and delete the Node Log section, just useful for getting logs straight on the terminal or just use tail to look at the logs [Optional] Create a service or use screen to run the command below, so it doesn't stop if you are using SSH, I used screen Run screen then press enter, (anytime you lose connection via ssh, run screen -r to get back the "screen/terminal" where geth was running

Geth Command

geth --config ./config.toml --datadir ./mainnet --cache 100000 --rpc.allow-unprotected-txs --txlookuplimit 0 --http --maxpeers 100 --ws --syncmode=full --snapshot=false --diffsync

Might be very Important

keep maxpeers at around 100

syncmode full, I used fast in other servers, never synced

snapshot false, means you won't be providing snapshots to other people, you can change it to true once you have synced and have a good server

diffsync not sure if this helped, but probably did
Following instructions literally, except for cache size since I only have 32 GB RAM. But I have better ssd (WD Black SN850 in RAID0), I'm still going slow as always. I got 2 hours in 5. I'm behind of about 2 days and half, so at this speed I would need about 5-6 days to sync lol How could you have done it in 20 hrs? Is it the RAM? I don't know...

hmmm, I am not entirely sure what it could be: it might be CPU or RAM you can runatop -d to see if somehow your iops are bottlenecking, which I doubt since you have nmve gen 4

NullQubit commented 2 years ago

i finally synced seemed the two biggest bottlenecks are CPU and IOPS (Actual sync time about 20hrs) Hardware Used :

Hetzner AX61 (128 gb ram AMD Ryzen™ 9 3900 12-Core )

2 x 1.92 TB raid 0

Got unlucky with gen3 nvme but still synced

Process:

download geth 1.1.5 and make it executable , optionally move it /usr/local/bin/geth

download mainnet.zip and unzip

generate genesis using command below, will also create a mainnet folder for blockchain data ./geth_linux --datadir mainnet init genesis.json

download the 14 nov 2021 snapshot

extract snapshot

move snapshot data to mainnet folder
rm -rf mainnet/geth/chaindata
rm -rf mainnet/geth/triecache
mv server/data-seed/geth/chaindata mainnet/geth/chaindata
mv server/data-seed/geth/triecache mainnet/geth/triecache
Actual sync process:

[Optional] Open config.toml and delete the Node Log section, just useful for getting logs straight on the terminal or just use tail to look at the logs [Optional] Create a service or use screen to run the command below, so it doesn't stop if you are using SSH, I used screen Run screen then press enter, (anytime you lose connection via ssh, run screen -r to get back the "screen/terminal" where geth was running

Geth Command

geth --config ./config.toml --datadir ./mainnet --cache 100000 --rpc.allow-unprotected-txs --txlookuplimit 0 --http --maxpeers 100 --ws --syncmode=full --snapshot=false --diffsync

Might be very Important

keep maxpeers at around 100

syncmode full, I used fast in other servers, never synced

snapshot false, means you won't be providing snapshots to other people, you can change it to true once you have synced and have a good server

diffsync not sure if this helped, but probably did
Following instructions literally, except for cache size since I only have 32 GB RAM. But I have better ssd (WD Black SN850 in RAID0), I'm still going slow as always. I got 2 hours in 5. I'm behind of about 2 days and half, so at this speed I would need about 5-6 days to sync lol How could you have done it in 20 hrs? Is it the RAM? I don't know...

How many peers are connected? Make sure traffic on P2P port (30311 by default) is allowed.

chevoisiatesalvati commented 2 years ago

i finally synced seemed the two biggest bottlenecks are CPU and IOPS (Actual sync time about 20hrs) Hardware Used :

Hetzner AX61 (128 gb ram AMD Ryzen™ 9 3900 12-Core )

2 x 1.92 TB raid 0

Got unlucky with gen3 nvme but still synced

Process:

download geth 1.1.5 and make it executable , optionally move it /usr/local/bin/geth

download mainnet.zip and unzip

generate genesis using command below, will also create a mainnet folder for blockchain data ./geth_linux --datadir mainnet init genesis.json

download the 14 nov 2021 snapshot

extract snapshot

move snapshot data to mainnet folder
rm -rf mainnet/geth/chaindata
rm -rf mainnet/geth/triecache
mv server/data-seed/geth/chaindata mainnet/geth/chaindata
mv server/data-seed/geth/triecache mainnet/geth/triecache
Actual sync process:

[Optional] Open config.toml and delete the Node Log section, just useful for getting logs straight on the terminal or just use tail to look at the logs [Optional] Create a service or use screen to run the command below, so it doesn't stop if you are using SSH, I used screen Run screen then press enter, (anytime you lose connection via ssh, run screen -r to get back the "screen/terminal" where geth was running

Geth Command

geth --config ./config.toml --datadir ./mainnet --cache 100000 --rpc.allow-unprotected-txs --txlookuplimit 0 --http --maxpeers 100 --ws --syncmode=full --snapshot=false --diffsync

Might be very Important

keep maxpeers at around 100

syncmode full, I used fast in other servers, never synced

snapshot false, means you won't be providing snapshots to other people, you can change it to true once you have synced and have a good server

diffsync not sure if this helped, but probably did
Following instructions literally, except for cache size since I only have 32 GB RAM. But I have better ssd (WD Black SN850 in RAID0), I'm still going slow as always. I got 2 hours in 5. I'm behind of about 2 days and half, so at this speed I would need about 5-6 days to sync lol How could you have done it in 20 hrs? Is it the RAM? I don't know...
hmmm, I am not entirely sure what it could be: it might be CPU or RAM you can runatop -d to see if somehow your iops are bottlenecking, which I doubt since you have nmve gen 4

I don't think the problem is CPU since I have Ryzen 9 5950X I don't think is RAM since I gave it more than 32 gb and there was no difference. By the way I ran atop -d but I'm not able to read the values. I mean, I never used it and I don't understand more than the basic ones. Also I don't know if they're normal values or not. I got couple of reds, but maybe they're normal, I don't know. I'll paste you here if you understand it.

I got 34 peers at the moment (with max at 100), should be pretty normal, right?

bnb-chain / bsc