Open EclesioMeloJunior opened 1 month ago
After a careful investigation using the pyroscope (grafana tool to analyse profiling metrics) I can see that the bottleneck is related to node.encodeChildrenOpportunisticParallel
which looks the most time consuming tasks leading to the sync slow down. I initially thought that this could be related to ext_crypto_sr25519_verify_version_2
but looking into different graphs times, and also the Self
x Total
values looks like this is a problem related to encoding the trie.
Given the fact that our trie is completely in memory right now, while executing and changing the current trie we will need to encode it to generate the state root hash which will be used to validade the state transition function, and since we don't optimize the trie to re-hash the modified parts (I believe this is already done given the lazy load trie ) we endup need to encode the whole trie every imported block (which slows down every time the state trie grows)
Here is a nice article to understand the self vs total metric values https://grafana.com/docs/pyroscope/latest/view-and-analyze-profile-data/self-vs-total/
Task summary
Gossamer is reaching #18748928 block. However for some reason the sync speed decreased from 30bps to 1bps, the sync is not impacted is just the block exec/import that is taking too long.
Other information and links
Here is the stdout logs of the running node, as you can see the node is taking too long to execute the batch of 7k blocks