filecoin-project / venus

Filecoin Full Node Implementation in Go
https://venus.filecoin.io
Other
2.06k stars 459 forks source link

Design: Integrate GraphsyncFetcher w/ PeerTracker #3177

Closed hannahhoward closed 5 years ago

hannahhoward commented 5 years ago

Description

When I use a GraphsyncFetcher to fetch tipsets, if the peer I am initially given fails to return the tipsets, I want to use the PeerTracker to failover to a different peer, so that I can still sync the chain.

Acceptance criteria

Risks + pitfalls

Where to begin

There are a couple key components to this task:

  1. Finding an identifying appropriate peers. Currently, the PeerTracker just returns a list of chains it knows about (in ChainInfo structs). The simplest approach would be to just take the first of these, then move on to the second, etc. Alternatively, we could attempt to use chain height in ChainInfos to try to identify a peer who at least has the current tipset we're interested in. Another question to consider is whether this process would live in the PeerManager (producing a method with more intelligence than List) or in the GraphsyncFetcher. The downside of putting things in the PeerManager is you can't easily track which peers you've already tried. The downside of putting things in the GraphsyncFetcher is as soon as you query the PeerTracker, the list of peers becomes out of date, and you might need to re-query.

Proposal: For the first version, keep it extra simple. Every time a request fails in the GraphsyncFetcher, query the PeerManager's existing List method, try the peers one by one in order. Iterate and optimize from there.

  1. How many retries to make -- could be a fixed amount, or a fixed time.

Proposal: For now, every time a request fails, just try all the peers, fail if none succeed.

ZenGround0 commented 5 years ago

Every time a request fails in the GraphsyncFetcher, query the PeerManager's existing List method, try the peers one by one in order. Iterate and optimize from there.

Perfect, this was just what I had in mind

Alternatively, we could attempt to use chain height in ChainInfos to try to identify a peer who at least has the current tipset we're interested in.

This is reasonable on the surface, but it means that nodes will ignore other nodes based on the state those nodes were in during the initial hello. This is not ideal because we expect these nodes to catch up and be useful eventually. One option would be to retry peers in sorted order of height but not ignore peers even with heights lower than the current. This would still seem to lead to an uneven load on the longest running nodes. In the longer term we could have peers sync head state more frequently within the peer tracker / extended hello protocol.

The downside of putting things in the PeerManager is you can't easily track which peers you've already tried.

Word

The downside of putting things in the GraphsyncFetcher is as soon as you query the PeerTracker, the list of peers becomes out of date, and you might need to re-query.

Are you thinking this is a downside because of performance or something else? I was actually imagining List would be so cheap that every failed graphsync call would call List again and find the first untried peer from that to mostly avoid querying stale peers. I might be overestimating speed though.

hannahhoward commented 5 years ago

Are you thinking this is a downside because of performance or something else? I was actually imagining List would be so cheap that every failed graphsync call would call List again and find the first untried peer from that to mostly avoid querying stale peers. I might be overestimating speed though.

Makes sense. Yea I'll just call List again, and then I will keep track of who's already been tried.

hannahhoward commented 5 years ago

Another thing to consider -- At least in theory, a Graphsync query could end with a partial response. (or the peer could simply go offline having sent a partial response) i.e I ask for 12 tipsets, I only get 6 before it fails. (or even 6.5 -- i.e. it fails halfway through tipset)

How important is it to pick up again in the right place? I think it would not be super hard to pick up at the beginning of the last complete downloaded tipset. I think it'd be super tricky to try to pickup mid tipset and it'd be diminishing returns.

ZenGround0 commented 5 years ago

I think it would not be super hard to pick up at the beginning of the last complete downloaded tipset. I think it'd be super tricky to try to pickup mid tipset and it'd be diminishing returns.

It would be great to pick up part way. Totally in agreement on diminishing returns on picking up partial tipsets.