decred / dcrd

Decred daemon in Go (golang).
https://decred.org
ISC License
731 stars 288 forks source link

netsync: Split header sync from block sync. #3324

Closed davecgh closed 3 months ago

davecgh commented 3 months ago

This refactors the code that deals with starting synchronization with the sync peer to split out the pieces specific to the initial header sync from the chain data sync. Along the way it removes some duplicate checks that are no longer necessary.

The primary motivation is to help pave the way for eventually performing parallel block downloads, but it also has the added benefit of making the logic a bit easier to follow, so it is useful on its own.

davecgh commented 3 months ago

Let's say multiple peers have the blocks we need. How difficult would it be to define "best" as:

* They have the blocks we need, and
* Peer A is faster than all the other peers that also have the blocks we need.

So even if Peers B and C may ultimately have more blocks, we can start with Peer A because it's fast, and then move to Peers B and/or C once Peer A has exhausted their usefulness.

Indeed. Although it isn't very obvious given the current structure, there are two different cases at play:

1) The initial header sync which doesn't take long and can't really be parallelized (at least not without significant protocol changes, that is) because all headers have to be proven to connect and every header requires the previous one it links to in order to do that. 2) The block sync based based on the headers. They are allowed to be out of order.

So, the ultimate goal is to make it so that the sync peer is only used for the the first case, and get rid of the entire notion of it for the second case and instead download them all in parallel from whatever peers have what we need.

That is what I was referring to by "parallel block download" in the PR description.

matthawkins90 commented 3 months ago

Whoops, I see now that I was using the wrong terminology when I asked about switching to faster peers during the headers sync. Glad to see that there's still a possibility for a fitness function when choosing a peer to sync headers from. I understand that the headers download is typically ultrafast and needs to happen consecutively. I just did a test IBD on mainnet and it took me 47 seconds to download all 873,000 headers, so it's not like that part is the bottleneck. I guess I was just thinking about protecting for the edge-case where the peer with the most headers has a latency of 300ms but the peer with just a few fewer headers has a latency of 15ms. But yeah, probably not a bottleneck that's worth optimizing.

davecgh commented 3 months ago

... I understand that the headers download is typically ultrafast and needs to happen consecutively. I just did a test IBD on mainnet and it took me 47 seconds to download all 873,000 headers, so it's not like that part is the bottleneck. I guess I was just thinking about protecting for the edge-case where the peer with the most headers has a latency of 300ms but the peer with just a few fewer headers has a latency of 15ms. But yeah, probably not a bottleneck that's worth optimizing.

It's perhaps also worth noting that there is already a failsafe against that type of behavior. Namely, there is a header sync stall timeout that will cause it to choose a different peer to sync the headers from if the process stalls out too long. Once the initial headers sync is done, it syncs headers from all peers.

In practice though, as you mentioned, it's not really a bottleneck. Also, since it only happens once at initial startup, by the time you figure out which peer might be the fastest, it's probably already done.