agoric-labs / PlaygroundVat

OBSOLETE prototype Vat host: use SwingSet instead
Apache License 2.0
30 stars 5 forks source link

Hi from the libp2p team! #8

Open daviddias opened 5 years ago

daviddias commented 5 years ago

Hi Agoric Team!

Congratulations on your PoC Release πŸ‘πŸ½πŸ‘πŸ½πŸ‘πŸ½. We were very excited when we saw the announcement and extra to see that you decided to give libp2p a try and wrote down some very useful notes on issues that you are hitting.

Following up on your notes:

However js-libp2p is missing a lot of features that the flagship Go implementation provides. One feature we would like is the DHT that disseminates host addresses.

This has been a long time coming and one that we are close to the finish line -- https://github.com/libp2p/js-libp2p-kad-dht/.

@vasco-santos and @jacobheun have been solving some last mile interoperability issues with the go-ipfs implementation and testing it on larger networks to make sure the implementation is sound. This is one of the big priorities for Q4 2018.

You can see how to activate it at -- https://github.com/libp2p/js-libp2p/tree/master/examples/peer-and-content-routing -- and experiment with it. Feedback and bug reports are welcome!

We might address this with embedded address hints, "redirectories", and/or by running a central server that can distribute address information.

If you use the WebRTC transport we have available, you will be using this central point (aka rendezvous) that provides an exchange for SDP offers and automatic Peer Discovery.

You can check the example at -- https://github.com/libp2p/js-libp2p/tree/master/examples/libp2p-in-the-browser -- to try it out.

Note: If you need WebRTC in Node.js, there is also a module that can be used and libp2p knows how to use it -- https://www.npmjs.com/package/wrtc --, however, it isn't the most stable thing yet.

Closely related to this is the NAT-bypassing relay behavior that allows IPFS servers to work behind firewalls. The consequence of this being absent from the JS port is that Vat nodes behind a firewall will not be able to accept connections from other Vats outside that firewall. Once a connection is made, it is used for messages in both directions, so certain topologies will work anyways.

We need to prioritize TCP hole punching (hint @jacobheun, @vasco-stantos). Until, there are two other ways to pierce through NATs, either using the WebRTC transport (which uses NAT hole punching natively) and/or use libp2p Circuit Relay.

We have a full tutorial on how to use it inside the js-ipfs examples folder, but it is an actual js-libp2p feature.

js-libp2p defaults to using (2048-bit) RSA keys for the node identities, which is adequate, but I'd prefer Ed25519 elliptic-curve keys, which are smaller and much faster. We may rewrite VatTP to use an entirely different wire protocol, in which messages are individually encrypted and then signed (so the signatures could be checked by third parties). In that case, the transport-layer encryption would be redundant, and we wouldn't care so much about the details.

We've been wanting to support Ed25519 Keys for a while (in fact, js-libp2p started with Ed25519 keys and then later changed to RSA to match go-libp2p/go-ipfs). We haven't prioritized this as it would be a significant change for the IPFS network that we have to plan for, however, we can totally add support for these in js-libp2p and let other disjoint networks use them.

@vasco-santos, @jacobheun, let's chat about this and see if we can make it happen this quarter.

The networking code currently brings up connections on demand: the TCP connection for each target host is initiated as soon as the first outbound message is generated for that host. An additional one-second loop is used to retry any failed connections. This is a bit too aggressive, and should be changed to use an exponential backoff algorithm, with random jitter to avoid the "thundering herd" problem.

Agreed, this should be configurable. Our dialer has been going through a refactor recently. @jacobheun, can you look into this?

In addition, until we have ACKs, we will try to make a connection even after all the messages have been delivered. Status messages are displayed to stdout each time the loop runs, making the console somewhat noisy (but we should display at least one message when the connection fails, to help diagnose problems).

I'm not 100% sure if I understand this point . Is this specific to the libp2p internals or your own wire protocol over libp2p?

Thaaaank you so much for this report ❀️ It is super useful for us to learn what are the pain points of our users. It helps us make better prioritization. If you don't mind, we can continue using this issue (or other issues) to keep the conversation going and let you know when this situations have been solved.

Also, feel welcome to join the libp2p our channels (GH, IRC) to report issues or chat with the js-libp2p team. @vasco-santos, @jacobheun and I have been planning our goals for Q4 and we are still on time to incorporate some of the needs here.

warner commented 5 years ago

In addition, until we have ACKs, we will try to make a connection even after all the messages have been delivered. Status messages are displayed to stdout each time the loop runs, making the console somewhat noisy (but we should display at least one message when the connection fails, to help diagnose problems).

I'm not 100% sure if I understand this point . Is this specific to the libp2p internals or your own wire protocol over libp2p?

It's specific to our code, but that feature probably wants to move into libp2p at some point. Basically our node is in one of two states (for any given remote target): either it desperately wants a connection because it has a message to deliver, or it has nothing to deliver and so it doesn't care whether it has a connection or not.

If we're in the desperately-want-a-connection mode, and we already have a connection, great, we just send the message. If we're currently trying to make a connection, ok, we just wait for that to succeed or fail. If we've never tried to make a connection, then we should definitely start trying to make one right away. If we've tried to make a connection and failed, now that's where things are more interesting, because if we try right away, we're likely to run into whatever the problem was that caused the previous attempt to break. So we should wait a little while, but not too long, and definitely not wait a fixed period of time because if a thousand of us were all connected to the same peer and they disconnected, and then all of us try to reconnect exactly 2.0 seconds later, then we're likely to overwhelm something, so it'd be better if we introduce some jitter. And we should probably increase the delay after each failed reconnection attempt because with each failure it's less and less likely that our reconnect will work. But put a cap on the delay so we don't end up waiting forever.

Also, if all of our connections have dropped at the same time, then our own host's network is probably offline (i.e. the laptop we're living on was closed and moved and reopened, and we just woke up and noticed all the sockets went away). When the first one gets reconnected, it might be a good idea to trigger the rest to try again soon, instead of waiting for their own timers. For Tahoe-LAFS, this saved us from sitting around in a partially-connected state for a long time, which would have caused us to store shares in too few places and miss our reliability goals.

The state machine which manages this sort of backoff-reconnector currently lives in our vat's comms library, and it's currently really simplistic. The only serious consequence for us right now is that the stdout logs get kind of noisy. In the long run, it'd be great if libp2p had some sort of "please try to maintain a connection" feature, in which a callback was fired each time a new connection is established, and libp2p managed the timeouts and state machines for us.

thanks!

jacobheun commented 5 years ago

We recently release 0.24 for libp2p, which includes state machine additions for libp2p and the switch. It also exposes a new dial method, which calls back with a state machine of the connection. Our goal here is to provide more support for users to be able to hook into the individual connections more. We also are going to be working on leveraging these changes with the Connection Manager to handle the backoffs internally in libp2p.

In the long run, it'd be great if libp2p had some sort of "please try to maintain a connection" feature, in which a callback was fired each time a new connection is established

Both go and js libp2p implementations are looking at how we can improve this and prioritize certain connections, so that for particular nodes we can work to maintain that connection. We are also working on reviewing the libp2p APIs in addition to supporting Async Iterators, which could help us improve support for returning new connections as they're established.

In regards to Ed25519 keys, we are close to having support for them released, we're just working through a minor issue with the compatibility of the various keys. This should be released in the next few weeks, I'll follow up here once that's available.