apache / arrow-rs

Official Rust implementation of Apache Arrow
https://arrow.apache.org/
Apache License 2.0
2.6k stars 794 forks source link

S3 + HTTP/2: "duplicate" header #6352

Open crepererum opened 2 months ago

crepererum commented 2 months ago

Describe the bug The bug occurs when S3 signatures are used in combination with HTTP/2. This (IIRC) is currently not possible with AWS S3, but some other vendors or custom implementation may run into this.

The combination produces the special :authority: HTTP/2 pseudo-header as well as the Host header. Some server or middleware implementation don't like that, e.g. Nginx will return a "bad request" and emit the log (here the host was localhost:9999:

client sent duplicate host header: "host: localhost:9999", previous value: "host: localhost:9999" while reading client request headers, client: 127.0.0.1, server: , host: "localhost:9999"

Note that the header is not really duplicate. It's likely that Nginx internally renames :authority to Host and then trips over it. Also see https://trac.nginx.org/nginx/ticket/2268 .

To Reproduce

use object_store::{
    aws::AmazonS3Builder,
    ClientOptions,
};

let store = AmazonS3Builder::new()
    .with_client_options(
        ClientOptions::new()
        .with_http2_only()
     )
    .with_region(...)
    .with_access_key_id(...)
    .with_secret_access_key(...)
    .build()
    .unwrap();

store.get(...).await.unwrap();

Expected behavior

Additional context I think the culprit is this bit here:

https://github.com/apache/arrow-rs/blob/97ae9d778b5f1d1fcc0c7beb91b2b1a6ed741194/object_store/src/aws/credential.rs#L157-L159

Of course one could argue that reqwest or h2 should de-duplicate the headers properly.

Xuanwo commented 2 months ago

Maybe related to https://github.com/seanmonstar/reqwest/issues/1809

micolous commented 3 weeks ago

Of course one could argue that reqwest or h2 should de-duplicate the headers properly.

There was a PR to automatically remove the host header on a library level, but it was rejected because it is legal to send both the Host and :authority headers, and could because of a proxy forwarding a HTTP/1.1 request.

The spec just says they may not differ.

However, it is unnecessary for a client to send both headers.

It looks the request signature needs to include the Host and/or :authority header(s) present in the request.

So this function needs to know ahead of time whether the connection is HTTP/1.1 or HTTP/2, because what it's signing will be different. It can't be taken care of by the HTTP library.

crepererum commented 3 weeks ago

So I would summarize this like this:

object_store operates within the spec, but the behavior is somewhat uncommon. Nginx rejects that behavior, even though it is explicitly allowed by the spec. Nginx also provides some evidence that object_store is somewhat weird, because many people use nginx w/ HTTP/2 and the bug on their side is still open, so this doesn't seem to be a super pressing issue for them.

We COULD make object_store more friendly (move it from uncommon to common behavior within the spec), but ultimately it's nginx that has to fix their stuff.


There was a https://github.com/hyperium/h2/pull/679, but it was rejected because it is legal to send both the Host and :authority headers, and could because of a proxy forwarding a HTTP/1.1 request.

So if a HTTP/2 proxy has a HTTP/1.1 upstream does it not convert the pseudo header (:authority:) into its HTTP/1.1 counterpart (Host)? That seems like an oversight in the spec.

Xuanwo commented 3 weeks ago

Hi, I submitted this issue to reqwest and finally chose to implement a workaround in https://github.com/apache/opendal/pull/2956. Perhaps this will be helpful for object_store to adopt as well.