EspressoSystems / hotshot-query-service

Generic query service for HotShot applications
https://espressosystems.github.io/hotshot-query-service/
GNU General Public License v3.0
3 stars 1 forks source link

Sync from newest to oldest #620

Open jbearer opened 1 month ago

jbearer commented 1 month ago

It would be better to run proactive scans, especially major scans, backwards, because the newest data is both the most likely to be missing (in the case where a node has just been offline for a short time, like an update) and the most likely to be queried.

Adding a reverse block stream might help simplify some of the transaction streaming stuff for the explorer as well.

jbearer commented 1 month ago

I'm thinking of an idea that might be better. Instead of having major and minor scans that run repeatedly, we just have 2 scans total, that both run perpetually, and we dovetail them in chunks. I will call these scans "heads" and "tails".

The heads scan always follows the chain head. It's job is to fetch recent blocks that we may have missed on decide, such as if we briefly lost network connection and missed a decide. The tails scan starts at the tail of the chain and follows behind the heads scan. Its job is to fetch old data that we're missing, such as after a long restart.

Both of these scans follow the same exact pattern, except for where they start and what they do when the reach the current block height:

The big advantage of this scheme is a bounded amount of work done each time a scan runs. We don't have occasional long pauses where a major scan runs, CPU usage spikes, and we stop fetching more recent missing blocks (because minor scans have to wait for the major scan to finish).

Another big advantage is on startup, we don't have to wait for a major scan to read from the database all the way from 0, before we start fetching more recent blocks which are more likely to be missing.