cloudflare / pingora

A library for building fast, reliable and evolvable network services.
Apache License 2.0
21.7k stars 1.2k forks source link

Inspecting or modifying bodies requires at least one deep copy. Make changes so that deep copy is not required #408

Open pszabop opened 2 weeks ago

pszabop commented 2 weeks ago

What is the problem your feature solves, or the need it fulfills?

I am attempting to implement something like the NginX HTTP substition filter, that allows modification of the request or response body. Documented here https://nginx.org/en/docs/http/ngx_http_sub_module.html

Sometimes I just want to inspect the body, other times I will want to inspect and transform the body.

There is an example of doing something similar in your examples directory: Transforming JSON to YAML. However it performs an extra deep copy (beyond the transformation) as it accumulates the Bytes into one Vec.

I believe this is due to the fact that HTTP chunk accumulation has to be eventually be assigned to body in the response_body_filter function signature, in order to send the accumulated chunks out on the wire, therefor it is beyond my ability to reduce the deep copies: It requires an API change.

    fn response_body_filter(
        &self,
        _session: &mut Session,
        body: &mut Option<Bytes>,
        end_of_stream: bool,
        ctx: &mut Self::CTX,
    ) -> Result<Option<std::time::Duration>> {
       if let Some(b) = body {
            // in the example it was accumulated into a Vec<u8>, I just accumulate Bytes
            // this is a deep copy
            ctx.chunks.extend_from_slice(&b); 
            // drop the body
            b.clear();
        }

        if end_of_stream {
            // inspect and transform here

            // this forces a deep copy when putting the data into the context
            // might have to do a std::mem::take, I didn't verify for this example
            *body = ctx.accumulated_bytes; 
        }
   }

Describe the solution you'd like

TL;DR - I would like to be able to inspect and sometimes transform request and response bodies with minimal or no deep copies if possible.

One way to do this is to accumulate into Vec<bytes> which involves no copying. Yes, implementing a near zero deep copy inspect / transform would be painful code, but I've seen it done in C before, and I have a Rust prototype in a sandbox that does this.

However, the function signature for response_body_filter would have to change to allow transmission of a Vec instead of just a single <Bytes>. For example:

    fn response_body_filter(
        &self,
        _session: &mut Session,
        body: &mut Option<Vec<Bytes>>,
        end_of_stream: bool,
        ctx: &mut Self::CTX,
    ) -> Result<Option<std::time::Duration>> {
       if let Some(b) = body {
            // not a deep copy
            ctx.chunks.push(b);     
        }

        if end_of_stream {
            // inspect and transform here on a Vec<Bytes> with no copies  - painful but doable

            // not a deep copy
            // might have to do a std::mem::take, I didn't verify for this example
            *body = ctx.chunks; 
        }
   }

I didn't run this by rustc so probably there's something obviously wrong to the trained eye, but that's the general idea.

Describe alternatives you've considered

I may be able to limit my accumulation to the low 100s of kbytes or so, so the extra deep copy currently required may not be too painful. The general case of the nginx sub_module, however, doesn't have this limitation. It's also amazing how fast requirements change.

Additional context

https://github.com/cloudflare/pingora/blob/main/pingora-proxy/examples/modify_response.rs https://nginx.org/en/docs/http/ngx_http_sub_module.html

SpinoPi commented 1 week ago

Yes, this is a highly useful feature! I'm currently developing a customized gateway but encountering a challenge. The upstream servers require different request paths and body formats, and I'm looking for a solution to handle this efficiently.

pszabop commented 17 hours ago

I have looked and have yet to find a "vector of Bytes" (or vector of capable version of the memchr (memmem) crate. I've seen code that implements this in C, but if you choose to implement this enhancement someone is going to have to write such a crate to make the enhancement useful. The state machine involved has to do fun stuff when transitioning between Bytes elements.

Either that, or my crate search skills are terrible.