brave / nitriding

Tool kit for building networked services on top of AWS Nitro Enclaves.
https://arxiv.org/abs/2206.04123
Mozilla Public License 2.0
19 stars 10 forks source link

Consider providing tun device for packet forwarding #26

Closed NullHypothesis closed 1 year ago

NullHypothesis commented 2 years ago

The only way that we currently provide for an enclave to talk to the outside world is a SOCKS proxy. If an enclave application doesn't already support SOCKS, it can be a pain to add. Instead, we could expose a tun device that automatically forwards all IP packets to the EC2 host. That's more flexible but also more complex and error-prone.

Let's investigate how much work that would be, and play with a PoC. Having a tun device would allow us to run more complex services like a Tor relay inside an enclave. Tor users could then do remote attestation and convince themselves that we are running an unmodified version of the Tor protocol. This matters because Tor relays can actively tag network flows for end-to-end correlation attacks.

NullHypothesis commented 2 years ago

After giving this some more thought, I believe that this is what we need:

Inside the enclave:

On the host:

rillian commented 2 years ago

So the motivation here is to simplify the enclave application by allowing it to use the TUN network device directly like on a normal host?

What does the iptables config control? Wouldn't the application running inside the enclave use the TUN device for all traffic by default if it's the only interface available? Something about making auditable firewall rules regardless of how the host proxies the traffic?

It would be nice to dedicate an additional ip address directly to the enclave tunnel device for applications that don't want to deal with NAT traversal. The host could either route the traffic and use local dhcp/ra to pass the configured address into the enclave, or we might be able to use TAP to bridge to a dedicated (virtual) network device on the host and let the enclave receive its address directly from AWS.

NullHypothesis commented 2 years ago

So the motivation here is to simplify the enclave application by allowing it to use the TUN network device directly like on a normal host?

Right. Simplicity is a welcome side effect but the main motivation is flexibility. For now, enclave applications are constrained by our cumbersome SOCKS interface which makes it difficult to support real-time applications like, say, Tor relays or DNS proxies (or resolvers) which cannot easily be patched to support SOCKS. For the research side of things, I intend to build a proof of concept of an enclave-enabled Tor relay which allows Tor clients to verify that their relay is behaving according to the protocol.

Also, if we provide a TUN interface, it will be easier to develop enclave applications in languages other than Go. We could decouple nitriding from the enclave application and have two processes running inside the Docker container: nitriding (which provides the attestation endpoint and takes care of the TUN forwarding) and the enclave application (which is self-contained and doesn't even need to know about nitriding).

Does that make sense?

What does the iptables config control? Wouldn't the application running inside the enclave use the TUN device for all traffic by default if it's the only interface available?

I'm not sure but that may be the case, yes. And yes, we can also use iptables to discard undesired traffic while giving users the ability to verify those rules via remote attestation.

It would be nice to dedicate an additional ip address directly to the enclave tunnel device for applications that don't want to deal with NAT traversal. The host could either route the traffic and use local dhcp/ra to pass the configured address into the enclave, or we might be able to use TAP to bridge to a dedicated (virtual) network device on the host and let the enclave receive its address directly from AWS.

Sounds like a useful improvement, yes.

NullHypothesis commented 2 years ago

Here's a summary of what my experiments taught me thus far. The package gvisor-tap-vsock solves the problem raised in this issue: It can create a TAP device inside the enclave and forwards traffic between the TAP device and a proxy application running on the EC2 host. Here's a more detailed explanation. gvisor-tap-vsock is fairly easy to integrate in nitriding and if we end up using it, it would allow us to re-architect the STAR randomness server as follows:

  1. Re-implement the star-randsrv wrapper in Rust.
  2. Expose an enclave-internal API in nitriding that allows the randomness server to register and/or synchronize its key material.
  3. Create a Dockerfile that starts two processes:
    1. The nitriding standalone process, which takes care of networking and exposes HTTP handlers for remote attestation, key synchronization, etc.
    2. The pure-Rust randomness server, which exposes its own HTTP handlers for clients to talk to.

I can think of two downsides:

  1. gvisor-tap-vsock introduces additional complexity. It's not prohibitive (the proxy running on the host is significantly more complex) but it's not trivial either.
  2. The performance implications are unclear. I don't expect this approach to be significantly slower than our current HTTP proxying but I have yet to do throughput and latency tests.

In my opinion, the trade-off is worth it and I intend to move forward with a PoC PR. Another question worth considering is how we should proceed with nitriding's current single-process architecture. Maintaining two programming models in parallel is too time consuming, which is why I prefer to move forward with an API-breaking version 2.0.0. This would abandon the package-based programming model and turn nitriding into a tool kit. (cc @rillian)

NullHypothesis commented 1 year ago

(Also copying @mwittie and @dlm: Let me know if you have any thoughts on the above!)

NullHypothesis commented 1 year ago

Below is a summary of the changes that I intend to make. The PR contains an architecture diagram that illustrates how nitriding is going to work.

What all of this means for nitriding users:

@rillian: Does the above sound sane to you? If so, I'm going to make my draft PR ready for review.

rillian commented 1 year ago

Your architecture diagram shows the nitriding and application servers responding separately to their respective requests, but I'm confused how those are routed. Are they on different ports? Does the nitriding server proxy for the main application? Where is TLS termintated? Do they share the certificate?

NullHypothesis commented 1 year ago

Below is a list of enclave-internal IPC endpoints that we need.

@rillian: When we discussed this, you were no fan of using HTTP for IPC. I don't consider this a problem (I'd expect most enclave applications to be implemented in Go or Rust, which provide built-in HTTP clients) but I'm open to alternatives. For example, we could do what Tor does and provide a custom, text-based protocol on top of TCP or rely on POSIX signals and/or file system files.

rillian commented 1 year ago

HTTP api for internal rpc is fine. Custom text protocols are notoriously difficult to get right. I like to complain about the cost but for this design it's a reasonable choice.

mwittie commented 1 year ago

@NullHypothesis it seems that POST /enclave/key does cover our use case. How will the users be able to get an attestation over the hash of the registered public key? Will there still be the external GET /attestation endpoint which now will include the hash of the key in addition to the fingerprint of the TLS certificate?

rillian commented 1 year ago

@mwittie Yes, the idea is that queries to the public GET /enclave/attestation endpoint will return a document signed by AWS containing the key hash submitted to the private POST /enclave/hash endpoint.