libp2p integration - Githubissues

fjl commented 5 years ago

libp2p is a modular framework of peer-to-peer networking components, implemented in several languages. Specs can be found here.

We want to have a shared transport protocol with libp2p to enable interconnectivity between the IPFS and Ethereum networks. Transport protocols specified by libp2p could be used as a replacement for RLPx.

raulk commented 5 years ago

Raúl from the libp2p team here 👋

A bunch of thoughts:

libp2p features a kademlia DHT implementation with slightly different behaviour. Currently this is used by many projects including IPFS and Polkadot (with their rust-libp2p impl).
- We're working on other DHT implementations as well, such as Coral.
- There's a DHT research group in Protocol Labs tackling several aspects such as security, routing, scoring, etc.
- Some ideas for improvements:
- Collection of ideas: https://github.com/libp2p/research-dht/issues/6
- Secure disjoint path lookups: https://github.com/libp2p/libp2p/issues/44
- Data structures: https://github.com/libp2p/go-libp2p-kad-dht/issues/194
- By adopting libp2p, we hope the efforts we're investing to improve DHT lookups, security, connectivity, etc. will benefit Ethereum as well.
RLPx handshake (ECIES) and SecIO share similarities. I collected my thoughts here: https://github.com/libp2p/go-libp2p-secio/issues/7#issuecomment-415031903
ENR (Ethereum Name Records, https://eips.ethereum.org/EIPS/eip-778) could be materialised via the Interplanetary Record System. Spec here: https://github.com/libp2p/specs/blob/master/IPRS.md
Multiplexing can be handled by Yamux. We support ls messages to list protocols (thus playing the role of wire-level HELLO in devp2p), then through a single connection, any number of virtual streams can be opened for each protocol/conversation.
We currently don't send DISCONNECT messages. Some discussion taking place here: https://github.com/libp2p/go-libp2p/issues/238
For integration with other languages (e.g. python, C++, Java, etc.), the libp2p daemon is under active development these weeks. This is a standalone process which allows local processes to interact with remote peers through libp2p, using simple Unix sockets and shared memory transports (in the future). Implementing bindings in other languages is straightforward. In fact, there's already a Gerbil binding, and other folks have expressed their interest in contributing Java and Python bindings.

There are probably many other relevant topics that don't spring to mind right now, but I will be watching this issue. I'm excited to help the Ethereum community through this journey wherever needed ;-)

P.S. If you need to chat with me, you can find me in the #libp2p Freenode IRC channel (nick: raulk), or in the Ethereum Sharding Gitter ;-)

vyzo commented 5 years ago

cc myself

raulk commented 5 years ago

@fjl What do you have in mind for the interconnectivity between IPFS and Ethereum? Any particular features you're interested in?

fjl commented 5 years ago

Whoa, so many suggestions at once.

RLPx handshake (ECIES) and SecIO share similarities.

RLPx is broken. I want to replace it. I've opened this issue because I want to integrate some of the libp2p transport protocols, in particular TCP+secio and UDT+secio, as replacements for RLPx.

By adopting libp2p, we hope the efforts we're investing to improve DHT lookups, security, connectivity, etc. will benefit Ethereum as well.

I think we have a solid plan for devp2p node discovery and will continue working on the protocol that we have for that.

ENR (Ethereum Name Records) could be materialised via the Interplanetary Record System.

Btw it's Ethereum Node Records. I don't understand 'materialised' in this context. Can you clarify what you mean?

fjl commented 5 years ago

What do you have in mind for the interconnectivity between IPFS and Ethereum? Any particular features you're interested in?

The idea is to be able to cross-connect the systems on the network level. If I understand IPFS correctly there should be a way to address content stored in other systems. I just thought it might be possible to make Ethereum a system like that. You could fetch blocks this way, for example.

raulk commented 5 years ago

Whoa, so many suggestions at once.

Apologies for the flurry. Since I've worked with devp2p before, I had been cooking up some thoughts for a while and I figured I'd share them.

RLPx is broken. I want to replace it. I've opened this issue because I want to integrate some of the libp2p transport protocols, in particular TCP+secio and UDT+secio, as replacements for RLPx.

Cool, good to know.

I think we have a solid plan for devp2p node discovery and will continue working on the protocol that we have for that.

👍

Btw it's Ethereum Node Records. I don't understand 'materialised' in this context. Can you clarify what you mean?

Apologies for the typo! I was thinking Ethereum Node Records would be stored in the DHT somehow, but I don't think that's the intention, so the connection isn't clear to me now.

The idea is to be able to cross-connect the systems on the network level. If I understand IPFS correctly there should be a way to address content stored in other systems. I just thought it might be possible to make Ethereum a system like that. You could fetch blocks this way, for example.

This is a neat idea. IPLD allows traversing different data types stored in IPFS (including ETH blocks). See https://github.com/ipfs/go-ipld-eth/blob/master/plugin/README.md. Currently one needs to import blocks manually into IPFS, so it would be helpful to have ETH clients with IPLD/IPFS capability push blocks onto IPFS real-time, or expose an IPFS/IPLD protocol so that other IPFS clients can pull data from them by advertising themselves in the IPFS DHT.

fjl commented 5 years ago

Apologies for the typo! I was thinking Ethereum Node Records would be stored in the DHT somehow, but I don't think that's the intention, so the connection isn't clear to me now.

Node records can be stored in any DHT because they're just small binary documents. Our discovery DHT stores them, anyway.

raulk commented 5 years ago

Our discovery DHT stores them, anyway.

Where can I learn more about this? Our discovery DHT means the ENR variant of the discovery DHT? From what I know, the v4/v5 discovery protocols didn't store data in the DHT in any way (in the Kademlia PUT_VALUE, GET_VALUE sense).

debris commented 5 years ago

cc @tomaka

fjl commented 5 years ago

Yes, it doesn't store any values. But node records are not values, they're just pointers to nodes with a bunch of metadata attached. The libp2p/ipfs equivalent to node records are multiaddrs, but these are not signed or versioned.

Where can I learn more about this?

There are two efforts to bring ENR support to the discovery DHT:

Discovery v4 ENR Extension (#44)
Discovery v5 (#48)

Both of these are being worked on by various people.

FrankSzendzielarz commented 5 years ago

@raulk Please see the discv5 subfolder for the work in progress.

ghost commented 5 years ago

Hello. Any updates on this effort? Any pointer? Or new link / issue?

Hola @raulk :wave: !!!

fjl commented 5 years ago

Discovery v4 ENR extension is done. We're still working on Discovery v5. Next step for me is finishing discovery v5, then I'll start looking into libp2p and transport stuff again.

fjl commented 3 years ago

Discovery v5 is now pretty stable, we just released version 5.2, but still working on it. Regarding the transport, I really want to work on a UDP-based transfer protocol next. Not sure if there is any overlap with libp2p's interests right now.

We also have issue #71, where some people got really excited about building some kind of TCP transport with libp2p, but the discussion seems to be over in that issue.

p-shahi commented 1 year ago

@fjl is there any interest in your part (or in the devp2p community) to utilize libp2p and its transports? Would be happy to re-engage on this and help from the Protocol Labs side if so

fjl commented 1 year ago

I'm generally interested. I think we've grown pretty accustomed to the high-level protocol model that exists in devp2p right now:

we maintain a single connection with every peer
on that connection, we want to run multiple application protocols
these protocols are composed of numbered messages with binary encoding

So whatever we select needs to support this model. In order to get this implemented, we should just select one transport and create a spec around that. What's the 'best' transport in libp2p right now?

p-shahi commented 1 year ago

@fjl glad to hear that. Is it possible to enable all transports to begin with? libp2p will select the best transport to use when dialing a peer. If you only select one then we lose the benefit of using WebTransport or WebRTC for connecting to the browser. Wdyt?

The "best" one would be QUIC (supported in go & rust libp2p)

fjl commented 1 year ago

I think it doesn't make sense to 'enable all' because it increases the attack surface so much. We are quite happy having a single transport with known security properties now. There is also the issue that clients have enough trouble as-is keeping up with network protocol changes. I would much prefer having very good support for one protocol that everyone implements well.

fjl commented 1 year ago

My reasoning here is similar to what is given in eth consensus networking specifications:

https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md#transport

All implementations MUST support the TCP libp2p transport, and it MUST be enabled for both dialing and listening (i.e. outbound and inbound connections). The libp2p TCP transport supports listening on IPv4 and IPv6 addresses (and on multiple simultaneously).

...

All listening endpoints must be publicly dialable, and thus not rely on libp2p circuit relay, AutoNAT, or AutoRelay facilities.

...

The Libp2p-noise secure channel handshake with secp256k1 identities will be used for encryption.

This is further justified by the rationale at the end of the spec: https://github.com/ethereum/consensus-specs/blob/dev/specs/phase0/p2p-interface.md#transport-1

Honestly, we should just use exactly the same transport stack as the consensus layer, and simply define a mapping of devp2p capabilities to mplex streams.

p-shahi commented 1 year ago

@fjl Thanks that's fair to go in the same direction as the eth consensus specification i.e. only utilizing TCP. However, I strongly advocate for enabling at least QUIC in addition to TCP. It is widely deployed on the IPFS/Filecoin networks. The PL libp2p team will be making the same push to adopt it in the consensus specs as well (it already references the desire to eventually use QUIC.) About the attack surface increase wrt enabling all transports, libp2p will have a stronger security posture after doing some planned audits and wider production deployments of the newer transports.

At least wrt adopting TCP to start, how do you want to go about next steps? Are the specifications something you want to draft and have the community and PL review? We're ready to help author, review, and draft the spec with you to get the ball rolling.

fjl commented 1 year ago

We mostly need a spec for devp2p capabilities over libp2p (i.e. this part of RLPx). The new document can be like rlpx.md, where the first part talks about the transport protocol (TCP, libp2p noise layer, yamux) and link to the related specs in libp2p.

In the second half, we should fully define the capability system. We can use one yamux stream per capability. The message code offset hack should be removed. The ping/pong protocol could be removed as well. Some of the disconnect reasons are useful. In fact, we could introduce capability-specific disconnect reasons as well.

fjl commented 1 year ago

A question: does the noise transport support sending authenticated data along with the initial handshake? Is this data available to the application? One thing I always wanted in RLPx is the ability to reject the connection immediately after the first handshake if capabilities and/or chain information do not match. At this time, we always need to perform the full crypto handshake AND the subprotocol negotiation AND additional capability-specific initialization before we can know whether the peer is a good match. If we could send a blob with capability information along in the noise handshake, the server side could immediately reject on mismatch.

mxinden commented 1 year ago

:wave: (rust-)libp2p maintainer here.

does the noise transport support sending authenticated data along with the initial handshake?

The Noise framework does. libp2p does not (yet). See https://github.com/libp2p/specs/issues/498 for embedding application data (here identify) in the TLS 0.5 RTT data. See https://github.com/libp2p/specs/pull/453 equipping the libp2p Noise XX handshake to carry additional data.

fjl commented 1 year ago

Thank you for the information about additional data in the handshake.

I think it highlights an important difference between libp2p and this project. libp2p handles this by introducing several optional extensions, which may or may not be used depending on the transport and configuration. In devp2p, we would rather just have a single protocol with guaranteed properties, which will be mandatory to implement for all clients. It simplifies things a lot.

p-shahi commented 1 year ago

@fjl Are you open to having a short design/architecture meeting, sometime next week (friendly to your timezone - CEST?), with Go & Rust libp2p maintainers? I will post the discussion points here after the meeting. (Some face to face time will help us get on the same page quicker.)

fjl commented 1 year ago

Yes, I'm definitely open to having a meeting!

p-shahi commented 1 year ago

sweet, I sent an invite to the email address listed on your GH profile.

p-shahi commented 1 year ago

Here are the meeting notes from 2023-01-25 https://pl-strflt.notion.site/2023-01-25-devp2p-libp2p-minutes-692a4068d60a4781877c34839100764b

ethereum / devp2p

libp2p integration #45