cloudflare / pingora

A library for building fast, reliable and evolvable network services.
Apache License 2.0
20.25k stars 1.1k forks source link

pinogra_core::protocols::http get back the underlying stream #230

Open telbizov opened 1 month ago

telbizov commented 1 month ago

What is the problem your feature solves, or the need it fulfills?

Let me start with this: I am very new to pingora and to Rust in general, so forgive me if I am asking obvious questions and making silly mistakes.

Given an existing HTTP session (server/client, h1/h2), I am looking for a universal way to extract the underlying Stream so that I can perform other operations with it. The way I see it, we would consume self and return the underlying stream.

The closest thing that I can see today is the reuse() method which works only for h1 and only if Keepalive is enabled https://github.com/cloudflare/pingora/blob/main/pingora-core/src/protocols/http/v1/server.rs#L806

However, for h2 and non-keepalive h1 there is no way to get ownership back of the underlying stream and do other things with it (in the same app).

Sample use case is implementing a MITM forward proxy server. We start by accepting an HTTP/1.1 CONNECT request over plain TCP, return a 200 and then start a TLS handshake over the same TCP socket. Currently the http::server::Session::new_http1(stream) takes ownership of the passed stream. In order to continue with a subsequent protocols::ssl::server::handshake() we need ownership of the stream back. Currently the only way to do this is to call the .finish() method, which calls reuse() of the prior http server, but that returns the Stream only if it's h1 and keepalive. What if we want to support H2 for example to the client ?

Describe the solution you'd like

Would it be reasonable, and would you accept a PR, that implements a new method that looks something like this:

pub fn into_stream(self) -> Stream {
    self.underlying_stream
}

Alternatively, perhaps we could have a similar method like protocols::http::server::Session::new_http1() that takes a reference to the the underlying stream rather than ownership. Would that work?

Describe alternatives you've considered

A workaround that I put together is to duplicate/clone the original stream by going all the way down to the raw socket fd, try_clone()ing it and back up to the Stream type but it feels unnecessary and inefficient to go through this so that we have descriptors/streams down to the same kernel socket, only to solve the ownership problem.