Introduce QUIC as a p2p control plane

thomaseizinger commented 3 months ago

Currently, the client and gateway run two protocols on the UDP socket:

ICE
Wireguard

We already de-multiplex those on the same socket within snownet.

For all other communicate like DNS queries, we currently relay through the portal. I don't think that is particularly good. For example, we have no means of retrying a failed DNS query. See #4266.

For this and other reasons, I think we should introduce a dedicated control plane next to our data plane, i.e. multiplex another protocol next to ICE and wireguard. My suggestion for this would be to use QUIC (and use HTTP3 as an application protocol but we could also directly run DNS over QUIC for example).

QUIC has a SANS-IO implementation in the form of quinn-proto, allowing an easy integration into snownet. That will allow us to offer high-level APIs on a snownet::Node that directly communicates with the other node whilst hiding the details on how exactly that happens.
QUIC packets can also be very easily identified by their packet header, making de-multiplexing very easy and efficient.
QUIC also gives us multiplexing of several requests (meaning sending multiple requests at the same time), it is encrypted and we don't need to worry about retransmissions.

There are of course alternatives but IMO none of them are very attractive:

Send more stuff through the wireguard tunnel: Not reliable, so we need to do our own retransmission.
Use the existing control plane: Likely slower because 2 RTTs and doesn't scale well to many clients.
Multiplex something home-grown next to ICE and wireguard: Just re-inventing the wheel.

Once we have a dedicated control plane, we can:

Run DNS queries directly to the gateway
Send "goodbye" messages, allowing clients to connect to another gateway before that one shuts down
... your favourite usecase here that requires another communication channel ...

jamilbk commented 3 months ago

I think I agree with having a "p2p control plane" alongside the existing one with the portal. Then the portal is solely involved in authorizing, configuring, and setting up connections, and this new "p2p control plane" would be used for

However, I want to be really careful about adding an entire protocol stack to do this. Which protocols we use has non-engineering effects -- we have to document its architecture for things like compliance certifications (and our customers), and document which crypto it uses (if non-standard) to Apple and Google for example. If QUIC is blocked in Enterprise environments, for example, we may not want to find that out after we've invested the time to add it, because it will break network connectivity and not just be a blocked website.

I would prefer first trying to devise a scheme to do this within our own WireGuard tunnel. We have the peers already connected, we would just need to agree on a simple metadata API for peer to peer communication.

We already segment our CGNAT space into Resource and Node ranges, so a client should be able to send packets directly to its connected Gateway Peer IPs and they wouldn't conflict with Resources.

In fact, for DNS, we may even be able to get away with the following minimal solution:

when a client accesses a DNS resource, the portal finds us the gateway
WireGuard tunnel is established just like for a regular resource, client receives gateway's peer IP
However in this case the Resource IP is the gateway's peer IP
Client forwards to the gateway's peer IP, gateway then forwards the real query

thomaseizinger commented 3 months ago

I would prefer first trying to devise a scheme to do this within our own WireGuard tunnel. We have the peers already connected, we would just need to agree on a simple metadata API for peer to peer communication.

We can do this but it means we are limited to protocols that do their own retransmission / are designed to work over UDP. That is very unusual for a "control plane".

If QUIC is blocked in Enterprise environments, for example, we may not want to find that out after we've invested the time to add it, because it will break network connectivity and not just be a blocked website.

Personally, I don't want to support such behaviour so I wouldn't mind if we say, sorry, you can't use this product 🤷‍♂️

Let them use their broken H2 stacks if they want to. The Internet of tomorrow is built on QUIC :)

jamilbk commented 3 months ago

we are limited to protocols that do their own retransmission

Open a tcp socket and receive JSON

Personally, I don't want to support such behaviour so I wouldn't mind if we say, sorry, you can't use this product 🤷‍♂️

Ah, that would be nice.

Let them use their broken H2 stacks

Enterprises have legitimate reasons to inspect HTTP traffic within their organization, and it appears that blocking QUIC allows them to use their existing WAFs and DLP products to do this for HTTP/2 (using SSL intercept I assume). In many cases they're legally obligated to.

But it's not just Enterprises. It's hotel WiFis, Airports, etc. It would be good to have an idea of how often QUIC is blocked. But network admins do so because they know connections will fallback to HTTP/2.

If it turns out a non-trivial portion of our customers block QUIC or have employees on networks that block QUIC, we can't use it for any mission-critical transport. I think iCloud Private Relay uses QUIC under the hood, maybe we can see how that's performing. But that's a personal service, not sure if businesses are relying on it.

I would love to go all-in on QUIC, but we need more real-world data about how it behaves globally on networks that don't honor the same net neutrality laws we get to enjoy.

WireGuard isn't perfect either, but we do have a couple years of experience running it in production as a L3 network transport. The end-user impact of blocking WireGuard would be much higher than blocking QUIC (which usually enjoys graceful degradation in almost all cases).

thomaseizinger commented 3 months ago

we are limited to protocols that do their own retransmission

Open a tcp socket and receive JSON

That would mean doing holepunching for TCP which requires doing a simultaneous open which is a lot more flaky then UDP hole punching.

thomaseizinger commented 3 months ago

I would love to go all-in on QUIC, but we need more real-world data about how it behaves globally on networks that don't honor the same net neutrality laws we get to enjoy.

So should we start gathering data about it? I.e. just check if we can open a QUIC connection? We could use it for "nice" good-bye messages to allow clients to quickly detect when a gateway goes down!

jamilbk commented 3 months ago

That would mean doing holepunching for TCP which requires doing a simultaneous open which is a lot more flaky then UDP hole punching.

I meant inside the tunnel -- listen on the Gateway's Peer IP, after the client connects it will be able to send JSON over the WireGuard tunnel to a socket bound on the Gateway to that IP.

The gateway could then only respond to other peers on that TCP socket.

thomaseizinger commented 3 months ago

That would mean doing holepunching for TCP which requires doing a simultaneous open which is a lot more flaky then UDP hole punching.

I meant inside the tunnel -- listen on the Gateway's Peer IP, after the client connects it will be able to send JSON over the WireGuard tunnel to a socket bound on the Gateway to that IP.

The gateway could then only respond to other peers on that TCP socket.

Where does the TCP stack come from? We would have to impoort a userspace TCP implementation into connlib to generate the packets. All we send is UDP at the moment. Wireguard itself doesn't care about retransmissions.

All the retransmission for application-level TCP is done by the kernel, we only get the resulting IP packets out of the TUN device.

thomaseizinger commented 3 months ago

If we open a TCP socket listening on the IP of our own TUN device, would the kernel route the traffic back to us?

The could work but feels a bit wonky 😅

jamilbk commented 3 months ago

If we open a TCP socket listening on the IP of our own TUN device, would the kernel route the traffic back to us?

Hmm I don't think so --

The Rust process bound to the TUN ip would open a tcp listener and receive messages there. The inner src address is a client's TUN address. The outer is the client's src IP (or NAT device sec IP).

When replying, the gateway sends to the client's TUN ip.

So the connection is TUN - TUN.

There shouldn't be packet loops -- our CGNAT range is effectively one giant LAN and we're already occupying these routes.

thomaseizinger commented 3 months ago

If we open a TCP socket listening on the IP of our own TUN device, would the kernel route the traffic back to us?

Hmm I don't think so --

The Rust process bound to the TUN ip would open a tcp listener and receive messages there. The inner src address is a client's TUN address. The outer is the client's src IP (or NAT device sec IP).

When replying, the gateway sends to the client's TUN ip.

So the connection is TUN - TUN.

There shouldn't be packet loops -- our CGNAT range is effectively one giant LAN and we're already occupying these routes.

I was hoping for the answer to be Yes and I think you are saying it does. With "routing back to us", I meant that we receive those IP packets generated by the kernel's TCP implementation as IP packets on the TUN interface, despite our socket being bound in the TUN device.

We can try this. It will not fit well into the current design of the tunnel. Where we currently do IO (and thus would bind a TCP socket) is at the very upper layer of the tunnel where we don't have any connection state. But we will need one TCP socket per connection, thus introducing new mappings / state that we need to communicate with events (happy to walk you through the code if interested).

I also don't want to invent yet another framing protocol for multiplexing over TCP, i.e. https://github.com/libp2p/rust-yamux.

If we bind a separate socket already, we may as well just bind a UDP socket and run QUIC over it which does multiplexing for us. The double encryption shouldn't hurt because it is not the data path anyway. QUIC can multiplex over a single socket so at least, we don't need to jiggle multiple sockets around and quinn-udp's SANS-IO design may be possible to integrate such we can just emit two events:

TransmitNetwork: Sends the UDP packet on the actual network
TransmitTunnel: Sends the UDP packet on a socket bound to the tunnel

We may not even need a UDP socket for that? We could just make those IP frames ourselves. For UDP, that is trivial. That might make it possible to still integrate it into snownet, which would be much nicer from an API perspective. All we need is the src and target IP for a particular connection so we can grab those IP packets before forwarding them to the application.

thomaseizinger commented 3 months ago

We could even try to run QUIC normally and if it fails, fall back to run jt through the tunnel.

Report a diagnostic event to the portal and we get numbers for how often it is blocked :)

jamilbk commented 3 months ago

Hmm, I see -- yeah, I do see the appeal to slot in QUIC from a technical perspective. Let's keep this open to tackle at a later iteration. First we probably need much more CI coverage (esp on the clients) to be confident in making protocol changes.

thomaseizinger commented 3 months ago

Let's keep this open to tackle at a later iteration.

For sure! It was interesting to hash out anyway :)

First we probably need much more CI coverage (esp on the clients) to be confident in making protocol changes.

Making it part of snownet by hand-crafting IP packets would mean we can actually cover this through unit tests that don't do any IO super easily. We could use IPv4 or IPv6 link-local addresses for these packets as those are guaranteed to never be routed so they should never appear as actual user traffic. Because a tunnel is always directly between a client and a gateway, we'd also only need two of them! I don't think we can use the actual interface IPs for this because it could be traffic for an app running directly on the gateway.

jamilbk commented 2 months ago

Another acecdote https://news.ycombinator.com/item?id=39972836

I suspect it's not uncommon

thomaseizinger commented 2 months ago

First we probably need much more CI coverage (esp on the clients) to be confident in making protocol changes.

Making it part of snownet by hand-crafting IP packets would mean we can actually cover this through unit tests that don't do any IO super easily. We could use IPv4 or IPv6 link-local addresses for these packets as those are guaranteed to never be routed so they should never appear as actual user traffic. Because a tunnel is always directly between a client and a gateway, we'd also only need two of them! I don't think we can use the actual interface IPs for this because it could be traffic for an app running directly on the gateway.

If we do this, QUIC can't be detected because it is encrypted by WG. So we get all the benefits of having a proper multiplexing protocol and re-transmissions etc dealt with for us without people detecting and blocking it. It means we need to hand-craft the UDP packets so it can't be used for anything performance critical but I think that is okay? A p2p control plane doesn't need to offer great performance.

Having it all part of snownet means we can easily experiment in the future with running it along-side wireguard if we want or maybe even replace it.

jamilbk commented 1 week ago

Spoke to another user recently whose firewalls block QUIC by default. The reason is because they have a secure web gateway feature that doesn't support QUIC (Fortinet), so it blocks it, and expects browsers to fallback to HTTP/2 instead where they can do SSL interception.

firezone / firezone

Introduce QUIC as a p2p control plane #4267