filecoin-project / venus

Filecoin Full Node Implementation in Go
https://venus.filecoin.io
Other
2.06k stars 462 forks source link

Bootstrap reports actionable errors #1729

Closed ZenGround0 closed 4 years ago

ZenGround0 commented 5 years ago

Description

We've hit the point where bootstrapping has the potential to end in errors that we should take action on, like maybe shut down the node if we see them. Right now we just log these. As we begin relying on bootstrapping for security this will become more important.

To allow for robust bootstrapping right now bootstrapping is a persistent process that checks that enough peers are connected every minute and connects to bootstrappers if connections are below a threshold.

There should be some way to listen on an error channel from this routine (and probably others) and use error information to do things like shut down the node.

Acceptance criteria

Risks + pitfalls

May or may not be a tricky node refactor (might be as simple as using a goroutine to connect an err channel from bootstrap to the node shutdown method). Need to understand good node operation UX to id whether we should just log this error or do something else.

Where to begin

filnet/bootstrap.go and node/node.go

anacrolix commented 5 years ago

https://github.com/libp2p/go-libp2p-kad-dht/pull/235/files#diff-b094d209fcc4392d666d9452bbee3a0dR72

@jhiesey and I intend to expose a lot more control to the consumers of the DHT for this kind of purpose. The linked to PR contains a correction to the API that enables your use case, in addition to removing some complexity that belongs in ipfs-cluster.

mishmosh commented 5 years ago

To clarify, this story is to improve error outputs (not to fix errors)

Updated acceptance criteria:

anacrolix commented 5 years ago

@mishmosh This is actionable now. If you switch to BootstrapOnce you can detect failures on individual bootstrap attempts.

anorth commented 5 years ago

What does bootstrapping refer to here? Is it just libp2p DHT bootstrapping, or the more general secure peer/chain bootstrapping that we need to implement (https://github.com/filecoin-project/specs/pull/196)?