Tips for running a BSC full node

unclezoro commented 2 years ago

Some of the enhancements below can address the existing challenges with running a BSC full node:

Binary

All the clients are suggested to upgrade to the latest release. The latest version is supposed to be more stable and get better performance.

Storage

According to the test, the performance of a fullnoded will degrade when the storage size exceeds 1.5T. We suggest the fullnode always keeps light storage by pruning the storage.

Following are the steps to do prune:

Stop the BSC node first.
Run nohup geth snapshot prune-state --datadir {the data dir of your bsc node} &. It will take 3-5 hours to finish.
Start the node once it is done.

The maintainers should always have a few backup nodes so that you can switch to the backup ones when one of them is pruning.

The hardware is also important, make sure the SSD meets: 2T GB of free disk space, solid-state drive(SSD), gp3, 8k IOPS, 250MB/S throughput, read latency <1ms.

Light Storage

When the node crashes or is force killed, the node will sync from a block that was a few minutes or a few hours ago. This is because the state in memory is not persisted into the database in real time, and the node needs to replay blocks from the last checkpoint. The replaying time dependents on the configuration TrieTimeout in the config.toml. We suggest you raise it if you can tolerate with long replaying time, so the node can keep light storage.

Performance Tuning

In the logs, mgasps means the block processing ability of the fullnode, make sure the value is above 50.

The node can enable the profile function by —pprof

Profile by curl -sK -v http://127.0.0.1:6060/debug/pprof/profile?seconds=60 > profile_60s.out, and the dev community can help to analyze the performance.

New Node

If you build a new BSC node, please fetch snapshot from: https://github.com/binance-chain/bsc-snapshots

acswap commented 2 years ago

@acswap Try to use this command: nohup geth snapshot prune-state --datadir /root/node/data. Hopefully, it can help you.

I'll test it out sometime

acswap commented 2 years ago

There was an error in the pruning @guagualvcha This is I use nohup geth snapshot prune-state --datadir /root/node/data &

INFO [11-11|09:08:59.898] Allocated cache and file handles database=/root/node/data/geth/chaindata cache=408.00MiB handles=32767 INFO [11-11|09:09:19.027] Opened ancient database database=/root/node/data/geth/chaindata/ancient readonly=false INFO [11-11|09:09:19.132] Deep froze chain segment blocks=3 elapsed=22.372ms number=12,442,726 hash=dd0c6b..888f7f WARN [11-11|09:09:19.151] Loaded snapshot journal diskroot=200dc4..d562f3 diffs=unmatched ERROR[11-11|09:09:19.151] Failed to open snapshot tree err="head doesn't match snapshot: have 0x200dc4d6b3233be9009eb32121a53e72530ccc1e83a82ac6540b3d617ed562f3, want 0xe306def5b049f0579ea63f0dfb573bdf44ec522b868d366b5055a776c77d58ea" head doesn't match snapshot: have 0x200dc4d6b3233be9009eb32121a53e72530ccc1e83a82ac6540b3d617ed562f3, want 0xe306def5b049f0579ea63f0dfb573bdf44ec522b868d366b5055a776c77d58ea

barryz commented 2 years ago

There was an error in the pruning @guagualvcha This is I use nohup geth snapshot prune-state --datadir /root/node/data &

INFO [11-11|09:08:59.898] Allocated cache and file handles database=/root/node/data/geth/chaindata cache=408.00MiB handles=32767 INFO [11-11|09:09:19.027] Opened ancient database database=/root/node/data/geth/chaindata/ancient readonly=false INFO [11-11|09:09:19.132] Deep froze chain segment blocks=3 elapsed=22.372ms number=12,442,726 hash=dd0c6b..888f7f WARN [11-11|09:09:19.151] Loaded snapshot journal diskroot=200dc4..d562f3 diffs=unmatched ERROR[11-11|09:09:19.151] Failed to open snapshot tree err="head doesn't match snapshot: have 0x200dc4d6b3233be9009eb32121a53e72530ccc1e83a82ac6540b3d617ed562f3, want 0xe306def5b049f0579ea63f0dfb573bdf44ec522b868d366b5055a776c77d58ea" head doesn't match snapshot: have 0x200dc4d6b3233be9009eb32121a53e72530ccc1e83a82ac6540b3d617ed562f3, want 0xe306def5b049f0579ea63f0dfb573bdf44ec522b868d366b5055a776c77d58ea

Looks like your snapshot was corrupted, you have to wait for the snapshot regeneration to be complete.

acswap commented 2 years ago

There was an error in the pruning @guagualvcha This is I use nohup geth snapshot prune-state --datadir /root/node/data & INFO [11-11|09:08:59.898] Allocated cache and file handles database=/root/node/data/geth/chaindata cache=408.00MiB handles=32767 INFO [11-11|09:09:19.027] Opened ancient database database=/root/node/data/geth/chaindata/ancient readonly=false INFO [11-11|09:09:19.132] Deep froze chain segment blocks=3 elapsed=22.372ms number=12,442,726 hash=dd0c6b..888f7f WARN [11-11|09:09:19.151] Loaded snapshot journal diskroot=200dc4..d562f3 diffs=unmatched ERROR[11-11|09:09:19.151] Failed to open snapshot tree err="head doesn't match snapshot: have 0x200dc4d6b3233be9009eb32121a53e72530ccc1e83a82ac6540b3d617ed562f3, want 0xe306def5b049f0579ea63f0dfb573bdf44ec522b868d366b5055a776c77d58ea" head doesn't match snapshot: have 0x200dc4d6b3233be9009eb32121a53e72530ccc1e83a82ac6540b3d617ed562f3, want 0xe306def5b049f0579ea63f0dfb573bdf44ec522b868d366b5055a776c77d58ea

Looks like your snapshot was corrupted, you have to wait for the snapshot regeneration to be complete.

I used a snapshot of Asia. Now do I delete all the snapshots and download the new ones again?

barryz commented 2 years ago

There was an error in the pruning @guagualvcha This is I use nohup geth snapshot prune-state --datadir /root/node/data & INFO [11-11|09:08:59.898] Allocated cache and file handles database=/root/node/data/geth/chaindata cache=408.00MiB handles=32767 INFO [11-11|09:09:19.027] Opened ancient database database=/root/node/data/geth/chaindata/ancient readonly=false INFO [11-11|09:09:19.132] Deep froze chain segment blocks=3 elapsed=22.372ms number=12,442,726 hash=dd0c6b..888f7f WARN [11-11|09:09:19.151] Loaded snapshot journal diskroot=200dc4..d562f3 diffs=unmatched ERROR[11-11|09:09:19.151] Failed to open snapshot tree err="head doesn't match snapshot: have 0x200dc4d6b3233be9009eb32121a53e72530ccc1e83a82ac6540b3d617ed562f3, want 0xe306def5b049f0579ea63f0dfb573bdf44ec522b868d366b5055a776c77d58ea" head doesn't match snapshot: have 0x200dc4d6b3233be9009eb32121a53e72530ccc1e83a82ac6540b3d617ed562f3, want 0xe306def5b049f0579ea63f0dfb573bdf44ec522b868d366b5055a776c77d58ea

Looks like your snapshot was corrupted, you have to wait for the snapshot regeneration to be complete.

I used a snapshot of Asia. Now do I delete all the snapshots and download the new ones again?

Nope, you need set flag snapshot=true in your startup command, and wait for the geth snapshot regeneration to be complete.

litebarb commented 2 years ago

Guys after prunning my full node data folder takes up about 800gb. I only have around 900gb of SSD space, this means I have to prune every day or so, anyone has any ideas?

psdlt commented 2 years ago

@litebarb get a bigger disk, no way around it.

thehood1 commented 2 years ago

Hi guys. Since my SSD is running out space, I am going to buy second SSD, and do disk striping(RAID0) . Do I need to format both SSD's and then do disk striping and then start downloading chaindata again, or I can do disk striping without losing chaindata from my first SSD? Thanks

psdlt commented 2 years ago

@thehood1 buy two more SSDs, stripe them, then copy data from existing SSD.

MhaiRuMhaiShee commented 2 years ago

Guys after prunning my full node data folder takes up about 800gb. I only have around 900gb of SSD space, this means I have to prune every day or so, anyone has any ideas?

Prune every day.

Take 2tb - 3tb of NVME

or split anchient folder into another Disk

thehood1 commented 2 years ago

@thehood1 buy two more SSDs, stripe them, then copy data from existing SSD.

I have emtpy 3TB HDD. Can I install xfs on it and use it to temporary store chain data from existing SSD, until I stripe two SSD's? It won't mess up the data?

psdlt commented 2 years ago

@thehood1 technically - yes, you can do that. But imagine how long it will take to copy ~700-800GB of data to HDD and then back to another SSD. Storage is cheep now, just buy additional SSD, will save you a lot of headache.

thehood1 commented 2 years ago

@thehood1 technically - yes, you can do that. But imagine how long it will take to copy ~700-800GB of data to HDD and then back to another SSD. Storage is cheep now, just buy additional SSD, will save you a lot of headache.

Thank you for help

Rem456 commented 2 years ago

@Lajoix @jcaffet folks, what kind of filing systems do you use on your servers? I've read somewhere that xfs is better than ext4 (sorry, don't remember where; don't have a link). Also, do you use single disk or RAID? I have one server on AWS (i3en.2xlarge, RAID0, xfs), been running it for ~half a year, never had issues you're describing. Few days ago I've setup another server on Vultr (also RAID0, also xfs) - it fast synced to latest block from scratch in under a day.

If you're constantly having sync issues and can't catch up to network - review your hardware setup, maybe spin up a different instance on a different region, maybe you just got a busy host, who knows.

I'm thinking of running the same instance on amazon, how much do you pay for data monthly?

thehood1 commented 2 years ago

INFO [11-15|22:44:31.746] Imported new state entries count=917 elapsed="333.295µs" processed=689,520,495 pending=64143 trieretry=0 coderetry=0 duplicate=0 unexpected=0

What are "processed" number in "Imported new state entries"? Number of transactions on BSC that I imported? If it is true then I will be synced when "processed" number catch current number of all transactions occurred on BSC, right? Anybody knows how many transactions occurred since BSC started?

ghost commented 2 years ago

Don't use snapshot data, it has unknown bugs.

masayil commented 2 years ago

How about archive node? My node data size exceeds 18T. Its sync speed is desperate。@guagualvcha When running the full node,I encountered the following error. It seems that this error is caused by --graphql

lich1710 commented 2 years ago

How about archive node? My node data size exceeds 18T. Its sync speed is desperate。@guagualvcha When running the full noed ,I encountered the following error. It seems that this error is caused by --graphql

LevelDB performance is very poor when the data size increase to TB. You need to split your archive node and use a proxy in front to route request. There's a guy already doing that, you can take a look here: https://github.com/allada/bsc-archive-snapshot/

MhaiRuMhaiShee commented 2 years ago

Light Storage

When the node crashes or is force killed, the node will sync from a block that was a few minutes or a few hours ago. This is because the state in memory is not persisted into the database in real time, and the node needs to replay blocks from the last checkpoint. The replaying time dependents on the configuration TrieTimeout in the config.toml. We suggest you raise it if you can tolerate with long replaying time, so the node can keep light storage.

When I run systemctyl stop geth and start again in a sec my node is behide 4000 blocks immediately ? What should I do to prevent this situation ? How should I close or shutdown the node properly with 4000 blocks still saved. not lag behide 4000 blocks again.

And should I increase TrieTimeout or decrese this value ?

bostarch commented 2 years ago

How can I use the full 64GB of ram on the bear metal node? the --cache XY or .toml setting doesn't really seem to do anything, node just takes up the amount of space it wants (usually around 32.4GB lately - even though I have it set back to default 18000)

unclezoro commented 2 years ago

Light Storage

When the node crashes or is force killed, the node will sync from a block that was a few minutes or a few hours ago. This is because the state in memory is not persisted into the database in real time, and the node needs to replay blocks from the last checkpoint. The replaying time dependents on the configuration TrieTimeout in the config.toml. We suggest you raise it if you can tolerate with long replaying time, so the node can keep light storage.

When I run systemctyl stop geth and start again in a sec my node is behide 4000 blocks immediately ? What should I do to prevent this situation ? How should I close or shutdown the node properly with 4000 blocks still saved. not lag behide 4000 blocks again.

And should I increase TrieTimeout or decrese this value ?

You can try gracefully shutdown the client by kill $pid, the node will try to persist data into db, usually it will take one or two minutes to finish. After that even you force kill the node, it can start normally without replaying the old blocks. If you have strong requirement on short replay time even been forced killed, you may decrease the TrieTimeout

unclezoro commented 2 years ago

How can I use the full 64GB of ram on the bear metal node? the --cache XY or .toml setting doesn't really seem to do anything, node just takes up the amount of space it wants (usually around 32.4GB lately - even though I have it set back to default 18000)

50% memory usage is a proper setting, don't push it too hard. If your nodes do not serve too many RPC requests, you can setting a relative larger cache(more than 18000) to fully utilize the memory.

jackyzhujiale commented 2 years ago

@guagualvcha Here's my profile 60s out file. I'm facing occasionally out of sync around every 1-2 minutes. mgasps is around 40-80. Would you kindly help check what's the bottle neck? Thanks. profile_60s.out.zip

unclezoro commented 2 years ago

@guagualvcha Here's my profile 60s out file. I'm facing occasionally out of sync around every 1-2 minutes. mgasps is around 40-80. Would you kindly help check what's the bottle neck? Thanks. profile_60s.out.zip

Your nodes works fine, there is some useless tx pool CPU cost on your nodes. You can disable it by adding DisablePeerTxBroadcast = true under [Eth], your nodes will not receive pending transaction from other peers with this enabled.

nxdht commented 2 years ago

@guagualvcha I have tried everything I can do, but my node still sync very slow, mgasps always below 30. Would you please take a look at my profile and give me some advice, thanks! profile_60s.zip .

thehood1 commented 2 years ago

Guys, please help. I am losing my mind with syncing process.

Hardware: AMD Ryzen 7 3800X 8-Core , 32GB RAM , Kingston M.2 NVMe 2TB SNVS/2000G NV1 Series(could not find IOPS for this hard disk) , 200Mbps / 80Mbps internet speed

On this NVMe disk I installed Ubuntu 20.04.3, downloaded geth_linux 1.1.5 , mainnet.zip 1.1.5. I downloaded 19th November snapshot, unpack it and start syncing 20h ago

Command line: ./geth --config config.toml --datadir /data/bsc/node --cache 16000 --txlookuplimit 0 --http --maxpeers 100 --snapshot=true --diffsync

config.toml: All default values, only added DisablePeerTxBroadcast = false and DiscoveryV5 = true

Hard disk usage - disk read values between 15k and 100k:

cpu and ram usage: CPU 153% - What is this?????

Elpsed time always around 10s...mgasps between 10 and 80, average 40

Very slow syncing, current data size 1.1TB, and still 2days behind...will this ever end? Any advice,please? Any way to speed up?

marianfurdui commented 2 years ago

hi, does any of you have a working step-by-step for successfully running a fullnode on bsc? I have appropriate hardware (2gb nvme, 128 gb ram, 16 core cpu). after few seconds after I start the process it's killed without errors . Thanks!

miohtama commented 2 years ago

Thank you for @guagualvcha for excellent tips.

I did a Twitter thread on Binance Smart Chain syncing, with some history and background on geth issues and some tips for the future syncers:

https://twitter.com/moo9000/status/1463454127095042050

miohtama commented 2 years ago

after few seconds after I start the process it's killed without errors

Check /var/log/syslog.

tail -n 50 /var/log/syslog

Usually system killed processes write the kill reason there, depending on the Linux distribution. The likely cause is OOM (out of memory) killer.

Hailiebaby16 commented 2 years ago

Some of the enhancements below can address the existing challenges with running a BSC full node:

Binary

All the clients are suggested to upgrade to the latest release. The latest version is supposed to be more stable and get better performance.

Storage

According to the test, the performance of a fullnoded will degrade when the storage size exceeds 1.5T. We suggest the fullnode always keeps light storage by pruning the storage.

Following are the steps to do prune:

Stop the BSC node first.

Run nohup geth snapshot prune-state --datadir {the data dir of your bsc node} &. It will take 3-5 hours to finish.

Start the node once it is done.

The maintainers should always have a few backup nodes so that you can switch to the backup ones when one of them is pruning.

The hardware is also important, make sure the SSD meets: 2T GB of free disk space, solid-state drive(SSD), gp3, 8k IOPS, 250MB/S throughput, read latency <1ms.

Diff Sync

The diffsync protocol rolled out as a stable feature in release v1.1.5. Diff sync improves the syncing speed by 60%～70% approximately according to the test. All full nodes are suggested to enable it by adding --diffsync in the starting command.

Light Storage

When the node crashes or is force killed, the node will sync from a block that was a few minutes or a few hours ago. This is because the state in memory is not persisted into the database in real time, and the node needs to replay blocks from the last checkpoint. The replaying time dependents on the configuration TrieTimeout in the config.toml. We suggest you raise it if you can tolerate with long replaying time, so the node can keep light storage.

Performance Tuning

In the logs, mgasps means the block processing ability of the fullnode, make sure the value is above 50.

The node can enable the profile function by —pprof

Profile by curl -sK -v http://127.0.0.1:6060/debug/pprof/profile?seconds=60 > profile_60s.out, and the dev community can help to analyze the performance.

New Node

If you build a new BSC node, please fetch snapshot from: https://github.com/binance-chain/bsc-snapshots

How restart node with diffsync

viwar000 commented 2 years ago

@guagualvcha Here's my profile 60s out file. I'm facing occasionally out of sync around every 1-2 minutes. mgasps is around 40-80. Would you kindly help check what's the bottle neck? Thanks. profile_60s.out.zip

Your nodes works fine, there is some useless tx pool CPU cost on your nodes. You can disable it by adding DisablePeerTxBroadcast = true under [Eth], your nodes will not receive pending transaction from other peers with this enabled.

Hi, @guagualvcha! I did DisablePeerTxBroadcast = true. It would be very nice of to check my profile 60s. Thanks. profile_60s.zip

eth.syncing { currentBlock: 13039015, highestBlock: 13039110, knownStates: 1197140991, pulledStates: 1197003513, startingBlock: 13036385 } out of sync

config.zip startNodeBSC.zip

viwar000 commented 2 years ago

Some of the enhancements below can address the existing challenges with running a BSC full node:

How restart node with diffsync

--diffsync

rmgitting commented 2 years ago

Hello guys, pruning used to work for me but after updating to the new geth version (1.1.5), pruning is no longer working. The node syncs and everything, but the pruning is always giving me this error: 406,516 slots=839,908,244 elapsed=18m56.019s eta=3m55.95s ERROR[11-29|13:05:08.540] Failed to prune state err="invalid subroot(path d3fd9752d539043c3be027c083716a739873c9a310155ca6db7a5e68e541c2ac), want 1056ff97a2dcea00101f34d3fe33cb3d33a135037fdb8278fa76ed5ade7f9f07, have d8f64ef4cd4ec96ebf2ba5ede7ef52b196d62e3b76f92a6dc9a1dc0d9abfcbd5" invalid subroot(path d3fd9752d539043c3be027c083716a739873c9a310155ca6db7a5e68e541c2ac), want 1056ff97a2dcea00101f34d3fe33cb3d33a135037fdb8278fa76ed5ade7f9f07, have d8f64ef4cd4ec96ebf2ba5ede7ef52b196d62e3b76f92a6dc9a1dc0d9abfcbd5

Before the update I was able to prune just fine, not much changed really other than some configurations in the P2P Nodes settings. Any idea what is the issue and how can I prune again?

viwar000 commented 2 years ago

--2021-12-01 01:50:16--  https://tf-dex-prod-public-snapshot-site1.s3-accelerate.amazonaws.com/geth-20211130.tar.gz?AWSAccessKeyId=AKIAYINE6SBQPUZDDRRO
Resolving tf-dex-prod-public-snapshot-site1.s3-accelerate.amazonaws.com (tf-dex-prod-public-snapshot-site1.s3-accelerate.amazonaws.com)... 143.204.99.12
Connecting to tf-dex-prod-public-snapshot-site1.s3-accelerate.amazonaws.com (tf-dex-prod-public-snapshot-site1.s3-accelerate.amazonaws.com)|143.204.99.12|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-12-01 01:50:18 ERROR 403: Forbidden.

pinswap commented 2 years ago

--2021-12-01 01:50:16--  https://tf-dex-prod-public-snapshot-site1.s3-accelerate.amazonaws.com/geth-20211130.tar.gz?AWSAccessKeyId=AKIAYINE6SBQPUZDDRRO
Resolving tf-dex-prod-public-snapshot-site1.s3-accelerate.amazonaws.com (tf-dex-prod-public-snapshot-site1.s3-accelerate.amazonaws.com)... 143.204.99.12
Connecting to tf-dex-prod-public-snapshot-site1.s3-accelerate.amazonaws.com (tf-dex-prod-public-snapshot-site1.s3-accelerate.amazonaws.com)|143.204.99.12|:443... connected.
HTTP request sent, awaiting response... 403 Forbidden
2021-12-01 01:50:18 ERROR 403: Forbidden.

You can download it in this way

For example: nohup wget -O geth-data.tar.gz " https://tf-dex-prod-public-snapshot-site1.s3-accelerate.amazonaws.com/geth-20211130.tar.gz?AWSAccessKeyId=AKIAYINE6SBQPUZDDRRO" &

cpecorari commented 2 years ago

Hi, trying to keep a BSC Node sync'ed but got issues every time after pruning, which is now a requirement like every 2 weeks to avoid a 2TB SSD to fill up completely... Isn't there any kind of auto-pruning option available ? Looks crazy that we need to stop the node every time. At least it should be an option for people running fast enough hardware, even if it could slow down like ever day for an hour or so ? It could really be helpful for people maintaining their nodes... Thanks for any other tip.

Grepsy commented 2 years ago

A tip for people waiting for the geth client to sync. Enter the following command on the console to get continuous updates on the current state of the sync:

setInterval(function() { console.log(new Date(eth.getBlock(eth.blockNumber).timestamp * 1e3)); } , 10000)

To be clear, this needs to catch up to the present time.

mj-dcb commented 2 years ago

Finally, my full node is in-sync. I tried a lot of cloud providers (AWS/DO) to set this up. I managed to get my node in-sync with a dedicated server from Hetzner. It took 6 hours with a snapshot. Ping me if you need the IP address to bootstrap your node, I need to whitelist your IP address though because it is only being used internally now.

mj-dcb commented 2 years ago

Hi, trying to keep a BSC Node sync'ed but got issues every time after pruning, which is now a requirement like every 2 weeks to avoid a 2TB SSD to fill up completely... Isn't there any kind of auto-pruning option available ? Looks crazy that we need to stop the node every time. At least it should be an option for people running fast enough hardware, even if it could slow down like ever day for an hour or so ? It could really be helpful for people maintaining their nodes... Thanks for any other tip.

You can automate this process, the software needs to be shut down you cannot avoid this. We plan maintenance on the weekends.

marianfurdui commented 2 years ago

Finally, my full node is in-sync. I tried a lot of cloud providers (AWS/DO) to set this up. I managed to get my node in-sync with a dedicated server from Hetzner. It took 6 hours with a snapshot. Ping me if you need the IP address to bootstrap your node, I need to whitelist your IP address though because it is only being used internally now.

Hi, I am on Hetzner as well but on a managed server which unfortunately has a lot of restrictions. Can you, please, post the command lines used to sync the snapshot after the download? Thanks!

mj-dcb commented 2 years ago

Finally, my full node is in-sync. I tried a lot of cloud providers (AWS/DO) to set this up. I managed to get my node in-sync with a dedicated server from Hetzner. It took 6 hours with a snapshot. Ping me if you need the IP address to bootstrap your node, I need to whitelist your IP address though because it is only being used internally now.

Hi, I am on Hetzner as well but on a managed server which unfortunately has a lot of restrictions. Can you, please, post the command lines used to sync the snapshot after the download? Thanks!

Hey Marian, do not forget to download the latest geth version and snapshot. I used a Storage box to temporarily store the snapshot which I extracted to the root disk. Anyway, here you go: /bsc/geth_linux --cache 75000 --nat=extip:<external-ip> --port 30413 --datadir /bsc/node --ws --ws.origins '*' --ws.api eth,net,web3 --ws.addr "<external-ip>" --ws.port 8545 --http --http.api net,personal,eth,web3 --http.addr 0.0.0.0 --allow-insecure-unlock --txlookuplimit=0 --config /bsc/config.toml --snapshot=true --maxpeers 100 --rpc.allow-unprotected-txs --diffsync

I use tmux to run it in the background.

This is what I changed in my config.toml [Node.P2P] MaxPeers = 100 DiscoveryV5 = true

marianfurdui commented 2 years ago

Finally, my full node is in-sync. I tried a lot of cloud providers (AWS/DO) to set this up. I managed to get my node in-sync with a dedicated server from Hetzner. It took 6 hours with a snapshot. Ping me if you need the IP address to bootstrap your node, I need to whitelist your IP address though because it is only being used internally now.

Hi, I am on Hetzner as well but on a managed server which unfortunately has a lot of restrictions. Can you, please, post the command lines used to sync the snapshot after the download? Thanks!

Hey Marian, do not forget to download the latest geth version and snapshot. I used a Storage box to temporarily store the snapshot which I extracted to the root disk. Anyway, here you go: /bsc/geth_linux --cache 75000 --nat=extip:<external-ip> --port 30413 --datadir /bsc/node --ws --ws.origins '*' --ws.api eth,net,web3 --ws.addr "<external-ip>" --ws.port 8545 --http --http.api net,personal,eth,web3 --http.addr 0.0.0.0 --allow-insecure-unlock --txlookuplimit=0 --config /bsc/config.toml --snapshot=true --maxpeers 100 --rpc.allow-unprotected-txs --diffsync

I use tmux to run it in the background.

thanks! have you changed something in config.toml or genesis.json?

mj-dcb commented 2 years ago

Finally, my full node is in-sync. I tried a lot of cloud providers (AWS/DO) to set this up. I managed to get my node in-sync with a dedicated server from Hetzner. It took 6 hours with a snapshot. Ping me if you need the IP address to bootstrap your node, I need to whitelist your IP address though because it is only being used internally now.

Hi, I am on Hetzner as well but on a managed server which unfortunately has a lot of restrictions. Can you, please, post the command lines used to sync the snapshot after the download? Thanks!

Hey Marian, do not forget to download the latest geth version and snapshot. I used a Storage box to temporarily store the snapshot which I extracted to the root disk. Anyway, here you go: /bsc/geth_linux --cache 75000 --nat=extip:<external-ip> --port 30413 --datadir /bsc/node --ws --ws.origins '*' --ws.api eth,net,web3 --ws.addr "<external-ip>" --ws.port 8545 --http --http.api net,personal,eth,web3 --http.addr 0.0.0.0 --allow-insecure-unlock --txlookuplimit=0 --config /bsc/config.toml --snapshot=true --maxpeers 100 --rpc.allow-unprotected-txs --diffsync I use tmux to run it in the background.

thanks! have you changed something in config.toml or genesis.json?

I just updated my message. I only changed the config file.

marianfurdui commented 2 years ago

Finally, my full node is in-sync. I tried a lot of cloud providers (AWS/DO) to set this up. I managed to get my node in-sync with a dedicated server from Hetzner. It took 6 hours with a snapshot. Ping me if you need the IP address to bootstrap your node, I need to whitelist your IP address though because it is only being used internally now.

Hi, I am on Hetzner as well but on a managed server which unfortunately has a lot of restrictions. Can you, please, post the command lines used to sync the snapshot after the download? Thanks!

Hey Marian, do not forget to download the latest geth version and snapshot. I used a Storage box to temporarily store the snapshot which I extracted to the root disk. Anyway, here you go: /bsc/geth_linux --cache 75000 --nat=extip:<external-ip> --port 30413 --datadir /bsc/node --ws --ws.origins '*' --ws.api eth,net,web3 --ws.addr "<external-ip>" --ws.port 8545 --http --http.api net,personal,eth,web3 --http.addr 0.0.0.0 --allow-insecure-unlock --txlookuplimit=0 --config /bsc/config.toml --snapshot=true --maxpeers 100 --rpc.allow-unprotected-txs --diffsync I use tmux to run it in the background.

thanks! have you changed something in config.toml or genesis.json?

I just updated my message. I only changed the config file.

last question (I hope): how do you check externally that the full node is up and running fine? Thanks!

mj-dcb commented 2 years ago

Finally, my full node is in-sync. I tried a lot of cloud providers (AWS/DO) to set this up. I managed to get my node in-sync with a dedicated server from Hetzner. It took 6 hours with a snapshot. Ping me if you need the IP address to bootstrap your node, I need to whitelist your IP address though because it is only being used internally now.

Hi, I am on Hetzner as well but on a managed server which unfortunately has a lot of restrictions. Can you, please, post the command lines used to sync the snapshot after the download? Thanks!

Hey Marian, do not forget to download the latest geth version and snapshot. I used a Storage box to temporarily store the snapshot which I extracted to the root disk. Anyway, here you go: /bsc/geth_linux --cache 75000 --nat=extip:<external-ip> --port 30413 --datadir /bsc/node --ws --ws.origins '*' --ws.api eth,net,web3 --ws.addr "<external-ip>" --ws.port 8545 --http --http.api net,personal,eth,web3 --http.addr 0.0.0.0 --allow-insecure-unlock --txlookuplimit=0 --config /bsc/config.toml --snapshot=true --maxpeers 100 --rpc.allow-unprotected-txs --diffsync I use tmux to run it in the background.

thanks! have you changed something in config.toml or genesis.json?

I just updated my message. I only changed the config file.

last question (I hope): how do you check externally that the full node is up and running fine? Thanks!

You can use RPC/Web3 to check the syncing status. I constantly monitor the txpool, you can run txpool.status to find out if it is processing the mempool.

99hops commented 2 years ago

Is it OK to restart with --diffsync if syncing was started from genesis and still not finished?

daron4ever commented 2 years ago

I have the following error. ERROR[12-06|00:57:22.262] Failed to open snapshot tree err="Failed to load head block"

marianfurdui commented 2 years ago

Finally, my full node is in-sync. I tried a lot of cloud providers (AWS/DO) to set this up. I managed to get my node in-sync with a dedicated server from Hetzner. It took 6 hours with a snapshot. Ping me if you need the IP address to bootstrap your node, I need to whitelist your IP address though because it is only being used internally now.

Hi, I am on Hetzner as well but on a managed server which unfortunately has a lot of restrictions. Can you, please, post the command lines used to sync the snapshot after the download? Thanks!

Hey Marian, do not forget to download the latest geth version and snapshot. I used a Storage box to temporarily store the snapshot which I extracted to the root disk. Anyway, here you go: /bsc/geth_linux --cache 75000 --nat=extip:<external-ip> --port 30413 --datadir /bsc/node --ws --ws.origins '*' --ws.api eth,net,web3 --ws.addr "<external-ip>" --ws.port 8545 --http --http.api net,personal,eth,web3 --http.addr 0.0.0.0 --allow-insecure-unlock --txlookuplimit=0 --config /bsc/config.toml --snapshot=true --maxpeers 100 --rpc.allow-unprotected-txs --diffsync I use tmux to run it in the background.

thanks! have you changed something in config.toml or genesis.json?

I just updated my message. I only changed the config file.

last question (I hope): how do you check externally that the full node is up and running fine? Thanks!

You can use RPC/Web3 to check the syncing status. I constantly monitor the txpool, you can run txpool.status to find out if it is processing the mempool.

Hi again,

How exactly txpool.status works?

./geth_linux --txpool.status complains with flag provided but not defined: -txpool.status

How a successful ./geth_linux --txpool.status should look like?

My bsc.log contains the same entries for few days in a row:

t=2021-12-08T20:55:18+0100 lvl=info msg="Looking for peers"                      peercount=0 tried=0  static=39
t=2021-12-08T20:55:53+0100 lvl=info msg="Looking for peers"                      peercount=0 tried=39 static=39
t=2021-12-08T20:56:04+0100 lvl=info msg="Looking for peers"                      peercount=0 tried=33 static=39

Thanks!

nilampatel-engineer commented 2 years ago

Hi, trying to keep a BSC Node sync'ed but got issues every time after pruning, which is now a requirement like every 2 weeks to avoid a 2TB SSD to fill up completely... Isn't there any kind of auto-pruning option available ? Looks crazy that we need to stop the node every time. At least it should be an option for people running fast enough hardware, even if it could slow down like ever day for an hour or so ? It could really be helpful for people maintaining their nodes... Thanks for any other tip.

by pruning will you loose complete history of the chain or state history? if you are running full bsc archive node?

masayil commented 2 years ago

no snapshot paired state

bnb-chain / bsc