Proper socket reading and writing

campbellC commented 5 years ago

The initial implementation of an http proxy uses a loop to keep pulling down bytes and writing them to the client, it then returns out. This isn't good enough - in higher versions of http 1 the connection will be reused so the communication needs to be bidirectional. We also need to be using the 'content-length' header to determine how many bytes need to be read.

Questions:

Do we need to split out the implementation based on the specific version of HTTP 1 or can we get away with code that handles 1.1 and higher?
Can we make the code generic enough to be used in the TLS case too?
How should we handle http 2 connections - fail, fallback to transparent?

campbellC commented 5 years ago

I've been hitting my head against this a little using tokio TcpStream and httparse to tell if a request is complete. Mainly I'm just battling the borrow checker. I'll start reading the tokio docs since they discuss something called framed readers and writers that sounds very similar to what we need.

campbellC commented 5 years ago

Reading the good intro to framed readers from the tokio docs - I definitely think this is what we want. Basically you implement encoders and decoders that convert from bytes to 'frames' - in our case thats http requests and responses - and vice versa. Then whenever you read and write you do so in terms of the protocol you actually care about and the encoder/decoder takes care of waiting for the full frame etc.

hyper implements something that looks like this. Can we copy that code? I don't think hyper exposes much of this stuff to client code which means using it as a library might be a no go. I'll have a play around writing a simple encoder/decoder using httparse.

campbellC commented 5 years ago

I've made an initial prototype where I've basically lifted an example from the tokio module called tinyhttp. The comments make it clear this isn't the most robust thing to use but to get off the ground it seems good to me.

Left to do is to make an equivalent for the client side. Basically one side ('server' side) has to pull off requests and write responses, the other side ('client' side) has to pull off responses and write requests.

It's also going to be much easier to test this stuff now that it's properly pulled out so I'll try and add UTs.

campbellC commented 5 years ago

I've added the client side encoder/decoder logic to pull responses and write requests. However, the example code given (and my copy+modify) needs work because it assumes no 'body' to the responses and String bodies to the requests. This needs to be made more generic and obviously we need to make it able to handle responses with bodies.

campbellC commented 5 years ago

I updated it to work with byte arrays. However, I'm pretty sure that httparse doesn't actually check that the whole thing has downloaded, it just checks that the headers aren't partially downloaded. This basically means we can't use for this purpose, so we'll need another one. I think ripping out hyper's parser might be the way to go.

campbellC commented 5 years ago

I wrote code to handle transfer encoding chunked and content-length headers. I'm sure there's edge cases that this doesn't cover. The code right now is pretty gross and it for some reason fails to produce the entire response from google every couple of runs. It's much better code than before though and I think is a good refactor away from being good enough.

campbellC commented 4 years ago

Annoyingly the latest version of tokio-tls has dropped the ability to get access to the native_tls::TlsStream underlying a tokio_tls::TlsStream. From the PR it seems this was to avoid making a type public rather than some underlying issue with exposing the functionality. I've left a comment on the issue about this and hopefully the certificate at least will be exposed.

Until then, Framed requires something that implements AsyncRead and AsyncWrite which native_tls does not so either we drop the Framed usage (which would be a right pain) or we fork tokio_tls in the short term and hopefully that gets fixed upstream later. I think that's the easiest approach for now unfortunately.

campbellC commented 4 years ago

Forking seems to work. I'm going to push out a decision on the TlsStream choice until I've heard back from the tokio devs, or they've published the stable version for async/await (apparently this is expected end of November). Anyway, basics are in place for this so resolving, definitely this area of code is buggy and needs fixing.

campbellC / third-wheel

Proper socket reading and writing #2