ethereum / portal-network-specs

Official repository for specifications for the Portal Network
291 stars 80 forks source link

Thin websocket to UDP socket proxy to allow browsers to join the club #71

Open pipermerriam opened 3 years ago

pipermerriam commented 3 years ago

An idea for how we can allow browsers to be part of the network.

An open question is whether there is a viable way for a browser based node, which ideally is running in some form of a browser extension to execute long running background processes that aren't constantly put to sleep. For the context of this issue, we'll assume this is possible

Why browser based clients are hard

The thing blocking browsers from easily joining the network is the lack of access to the UDP transport.

The initial idea for how we might be able to shim browsers in would be to use websockets or webrtc to do communication between browsers with some form of bridging to the outside world, however, this has two unfortunate things that probably make it non viable. 1: websockets and webrtc are both long-lived connections which is at odds with the network designs which are based around UDP datagrams. It's likely that trying to do discv5 over websockets/webrtc incurs significant overhead in continuously establishing and disconnecting these connections. 2: the idea of bridging is fundamentally not easy to make compatible with the way the network functions, since the browser nodes would be somewhat partitioned off from the network and either they could only communicate with nodes that supported the bridge or we'd need some sort of relay system, both of which are complex solutions.

A simple solution

The proposed solution isn't pure peer-to-peer.

It should be relatively simple to write a piece of software that does the following:

  1. Run a websocket server and listen for new incoming connections.
  2. When a new incoming connection is established, open up an external facing UDP socket.
  3. Bridge the websocket stream with the UDP socket, relaying packets between the two
+-----------+     +------------------------------+
|  Browser  |     |       Proxy Server Thing     |
+-----------+     +-----------+     +------------+
|           | --> |           | --> |            |
| websocket |     | websocket |     | UDP socket |
| (client)  | <-- | (server)  | <-- |            |
|           |     |           |     |            |
+-----------+     +-----------+-----+------------+

This proxy should be quite lightweight, likely being able to support many browsers from a single server. The proxy should have the same visibility into the data you are sending as your ISP since all of the packed data will be encrypted. Running this service "benevolently" on the scale of supporting millions of browsers connecting to it would likely cost well under $100k USD per year.

pipermerriam commented 3 years ago

The proxy should also be quite easy to run which would allow self-hosted options for those who don't want to trust whoever is operating the service.

pipermerriam commented 3 years ago

Thinking about this in the context of metamask.

If metamask is paying for infura, then the cost of running this proxy service should almost definitely be less than whatever they would need to pay infura for how much traffic they generate. Under this model, a browser based wallet provider like metamask who "wants" to be less centralized could switch over to this model of embedded portal clients and save piles of :moneybag: money :moneybag: while also supporting the network.

pipermerriam commented 3 years ago

here's a simple pseudo-code illustration of the concept

async def listen_websocket():
    websocket_server = ...

    with trio.open_nursery() as nursery:
        async for socket in websocket_server:
            nursery.start_soon(manage_websocket(socket))

async def manage_websocket(websocket):
    async with open_new_udp_socket() as udp_socket:
        with trio.open_nursery() as nursery:
            nursery.start_soon(feed_outbound_datagrams, websocket, udp_socket)
            nursery.start_soon(receive_inbound_datagrams, websocket, udp_socket)

async def feed_outbound_datagrams(websocket, udp_socket):
    while True:
        length_prefix = await websocket.read_exactly(4)
        payload_length = int.from_bytes(length_prefix, 'big')

        payload = await websocket.read_exactly(payload_length)
        datagram, (ip, port) = decode_payload(payload)

        await udp_socket.send_datagram((ip, port), datagram)

async def receive_inbound_datagrams(websocket, udp_socket):
    while True:
        datagram, (ip, port) = await udp_socket.recv()
        payload = encode_payload(datagram, ip, port)
        length = len(payload)
        length_prefix = length.to_bytes(4)
        await websocket.send_all(length_prefix + payload)

def encode_payload(datagram, ip_address, port):
    ...

def decode_payload(payload):
    ...
acolytec3 commented 2 years ago

For future reference, here's the working prototype I built to address this question https://github.com/acolytec3/ultralight-proxy which can be paired with https://github.com/acolytec3/discv5-browser and then any other discv5 client. The browser client should be able to complete the discv5 handshake, visible in the logs

backkem commented 2 years ago

Regarding websockets: In some cases they may require https for production use (browsers cracking down on http and mixed content). This means you'd need valid SSL certs and likely valid domain names for all proxies. I've seen similar projects struggle with that. So that may be worth investigating further.

One difference between websockets and webrtc is that webrtc does allow P2P connections between browsers and also between browsers and native nodes. This has been demonstrated by WebTorrent, for example. It would make the browser nodes more equal participants in the network. In addition, it reduces the need for proxy nodes with public IPs since webrtc can do NAT traversal. That being said, webrtc is a rather heavyweight protocol and you still need some public infrastructure for the 'signaling' step.

acolytec3 commented 2 years ago

Along these lines, I have on my to-do list to take amother look at webRTC and try implementing it at the transport layer inside of discv5 using peerjs since that seems to abstract away the difficulties of implementing it, at least in the TS/JS context. My only concern here is the it feels like it is really reinventing the wheel with what libp2p is doing and so it might be more profitable to just leverage what they're already doing in this space with star servers and relays for transport rather than build it all over again here. And, it still doesn't resolve the fundamental tension in what we're doing with browser based clients and the fact the discv5 isn't intended to establish long-lived connections but use the strength of the network responses via udp to retrieve data, for whatever sort of concern that is.

acolytec3 commented 2 years ago

Just revisiting this topic as we approach something like a living testnet. The current proxy seems to work reasonably well when run locally and is able to make outbound connections to portal network clients outside of my LAN but I haven't yet got a working version that exposes a public IP that's accessible for inbound connections (mainly due to lack of understanding of how all that stuff works and not having had the time yet to learn). I built a very simple prototype of a webRTC proxy service that uses WakuV2 for the signaling piece but it would literally just be a replacement for the existing websocket server proxy where browser clients connect to the proxy via webRTC and then packets get routed to the rest of the portal network via UDP. So, all it really does is replace a websocket proxy connection with a webRTC proxy connection and I don't know how valuable that is.

acolytec3 commented 2 years ago

We could use webRTC to allow those p2p browser connections but then we're just opening long-lived connections between browser clients and I'm not sure if that's fruitful or not since it sort of goes against the main use for discv5. I think we could extend the ENR encoding to encapsulate webRTC signaling data so nodes would know how to connect to each other via webRTC but they would still need a proxy to connect to the rest of the UDP based network. Does it feel like this is a fruitful avenue for further research. The current prototype isn't hooked into the Ultralight client yet and just implements the proxy piece of it so would need to do further experimentation to get browser to browser discv5 connections up and running.

ComfyGummy commented 11 months ago

which ideally is running in some form of a browser extension to execute long running background processes that aren't constantly put to sleep. For the context of this issue, we'll assume this is possible

For reference: Long-running background processes are no longer possible in Chrome extension as of Manifest v3. One of the explicit design goals of Manifest v3 was to prevent extensions from taking up background system resources. Overall, it reduces the privileges that extensions have to something close to what plain web pages get.