Open webmaster128 opened 1 year ago
Our state-sync scripts run every Sunday, so this issue is rather timely. Nois testnet state-sync took a bit over an hour today. In contrast, Stargaze mainnet state-sync took about 10 minutes. I am not knowledgable enough to articulate the reasons behind it. As Simon mentioned, Evmos and Sei state-sync is also slow and at times unreliable.
Would it make sense to make the inventory of the number of application.db KV entries for all these networks and some other very busy networks to say if the rule stands or no?
Adding profiles: cpu.pb.gz mem.pb.gz
Binary can be built using: https://github.com/noislabs/full-node/tree/main
One thing to mention is that stargaze still has fast node disabled by default (will be changed to enabled in our next upgrade).
Seems that fast node is contributing to more cpu cycles and garbage collection
there are a couple issue in iavl to assist with this. Here is one pr that aims to increase the speed. https://github.com/cosmos/iavl/pull/664.
This issue could be to how large your current state is? ill transfer this issue to iavl so it can be focused on where the actual problem may lie. Lots of the compacting is being cleaned up as we speak.
with iavlv1 this should be significantly faster. Would you want to test with this version?
I recently started to play with using state-sync to create a fresh node in the Nois testnet. There we observed that applying the last chunk of the snapshot takes approximately 1 hour and requires something between 13 and 14 GB of memory. As many community members consider this unexpectedly long, I though it's worth debugging what's going on here.
Network
Snapshot
System and utalization
Logs
The following logs show how the snapshot is downloaded easily, the fist 17 chunks are applied within 1 minute each and the last chunk takes 62 minutes to apply.
Data/wasm directory
Element count:
Profiling
Some initial profiling data provided by @jhernandezb shows a lot of compating activity. Maybe he can elaborate more on the profiling of things.
Other notes
My gut feeling here is that at a certain number of database elements, things become slow. But I am far from a Go profiling expert or database expert and don't know how to debug this further. For us it is not a big deal right now and we can try to use less KV elements for the app. But maybe this is a good chance to prevent other mainnets running into problems.