Improve performance of proxy application and TAP handler

NullHypothesis commented 1 year ago

I've been working on some tooling that can help us measure nitriding's networking performance. So far, I have a minimal Go Web server that implements a simple "hello world" handler. I tested the Web server in three scenarios:

Docker: In a Docker container (with no nitriding or enclaves involved), which serves as our baseline.
Nitriding-nrp: In an enclave, with the Web service receiving connections directly from clients.
Nitriding: In an enclave, with nitriding acting as a reverse proxy in front of the Web service.

All three scenarios use HTTP only, to eliminate the computational overhead of TLS. I then used baton to measure the requests per second that the Web service can sustain. The results are:

The numbers aren't great. Let's use this issue to do some debugging, identify bottlenecks, and improve the networking code.

NullHypothesis commented 1 year ago

rillian commented 1 year ago

What was the baton command line? Concurrency level made a significant difference to throughput in my tests. We should make sure we're measuring the tunnel and proxy's capacity and not just the latency.

NullHypothesis commented 1 year ago

Yesterday, I measured requests per second (for a simple "hello world" Web server) for an increasing number of baton threads:

All setups can sustain more reqs/sec as the number of sender threads increases—except when we use nitriding's reverse proxy, which sees a reduction in reqs/sec. Some time this week, I'll take a closer look at Go's reverse proxy implementation to see what easy improvements we can make.

NullHypothesis commented 1 year ago

Elaborating on the above: The "Enclave" setup constitutes the approximate upper limit that we can achieve with nitriding. This setup has no nitriding: it consists of a Web server that binds directly to the VSOCK interface, and a custom baton that sends requests directly to the VSOCK interface.

At this point, there are two significant bottlenecks:

Nitriding's (or rather: Go's) HTTP reverse proxy. In this thread, someone argues that the reverse proxy does poorly when faced with synthetically-generated requests.
Nitriding's tap interface (and the user space TCP stack that comes with it) and the gvproxy that runs on the EC2 host. It's not clear which one is the worse offender. We should measure these two components in isolation, and then focus on the slower one.

NullHypothesis commented 1 year ago

I stumbled upon an issue that describes the problem we're seeing: https://github.com/golang/go/issues/6785. Increasing MaxIdleConnsPerHost makes a significant difference. In a preliminary test, I set it to 1000, which makes the reverse proxy perform almost identically to the "no reverse proxy" setup:

For posterity, a few other things I've tried:

Run stress tests with bombardier instead of baton. Bombardier promises to be faster because it's built on top of fasthttp instead of Go's built-in net/http (which baton is presumably using). In my tests, bombardier actually performs slightly worse than baton, achieving an average of 1,027 reqs/sec in the "Nitriding" setup whereas baton achieves 1,195.
Use a BufferPool for the reverse proxy. Go's reverse proxy implementation is allocating a 32 KB buffer for each incoming request. A buffer pool allows for the reuse of buffers and also mitigates the garbage collector's work. Unfortunately, this made no difference in the numbers. Regardless, it's probably a good idea to add this.

NullHypothesis commented 1 year ago

For the record, we just merged PR https://github.com/brave/nitriding/pull/61, which improves the status quo.

brave / nitriding-daemon

Improve performance of proxy application and TAP handler #2