encode / httpx

A next generation HTTP client for Python. 🦋
https://www.python-httpx.org/
BSD 3-Clause "New" or "Revised" License
12.7k stars 811 forks source link

RemoteProtocolError: multiple Transfer-Encoding headers #1862

Open cancan101 opened 2 years ago

cancan101 commented 2 years ago

Running:

httpx.get('https://adobe.wd5.myworkdayjobs.com/en-US/external_experienced/job/San-Jose/Director--Corporate-Strategy_R109552-2')

produces RemoteProtocolError: multiple Transfer-Encoding headers.

Using requests, works fine:

requests.get('https://adobe.wd5.myworkdayjobs.com/en-US/external_experienced/job/San-Jose/Director--Corporate-Strategy_R109552-2').headers
{'transfer-encoding': 'chunked, chunked', 'Date': 'Tue, 14 Sep 2021 17:10:47 GMT', 'Vary': 'origin,access-control-request-method,access-control-request-headers,accept-encoding', 'Server': 'Workday User Interface Service', 'Set-Cookie': 'wd-browser-id=042c7ebd-81b7-429a-965d-05552e9dd259; Path=/; Secure; HTTPOnly, PLAY_SESSION=4920b9e5c981a4b74626cbea94ca58a36187f668-adobe_pSessionId=b44gglc1l9ehb4jtu4lr4q7blu&instance=wd5prvps0004d; SameSite=Lax; Path=/; Secure; HTTPOnly, wday_vps_cookie=2938515978.61490.0000; path=/; Httponly; Secure; SameSite=none', 'Content-Type': 'text/html;charset=UTF-8', 'X-Frame-Options': 'DENY', 'X-WD-REQUEST-ID': 'VPS|aecdf787-0e93-4a4f-ba57-6e1020d1eb00', 'Content-Encoding': 'gzip', 'Content-Language': 'en-US', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains'}
tomchristie commented 2 years ago

Thanks @cancan101 - looks like a case of h11 being (reasonably enough) stricter about erroring on a non-compliant response. However, really we would like to be able to be robust to this kind of thing.

nuno-andre commented 2 years ago

Multiple Transfer-Encoding headers are explicitly forbidden in specs. But browsers, and many libraries, accept them; so this is a case that we may have to deal with at some point.

To ease this restriction you can monkey-patch h11._headers.normalize_and_validate.

def _patch(headers, _parsed=False):
    ...

import httpx
# monkey-patch AsyncClient
httpx._transports.default.httpcore._async.http11.h11._headers.normalize_and_validate = _patch
# monkey-patch Client? (not tested)
httpx._transports.default.httpcore._sync.http11.h11._headers.normalize_and_validate = _patch
Here's mine. It only accepts multiple {'transfer-encoding': 'chunked'} headers. ```python import re from h11._abnf import field_name, field_value from h11._util import bytesify, LocalProtocolError, validate from h11._headers import Headers _content_length_re = re.compile(br"[0-9]+") _field_name_re = re.compile(field_name.encode("ascii")) _field_value_re = re.compile(field_value.encode("ascii")) def _patch(headers, _parsed: bool=False): new_headers = [] seen_content_length = None saw_transfer_encoding = False for name, value in headers: # For headers coming out of the parser, we can safely skip some steps, # because it always returns bytes and has already run these regexes # over the data: if not _parsed: name = bytesify(name) value = bytesify(value) validate(_field_name_re, name, "Illegal header name {!r}", name) validate(_field_value_re, value, "Illegal header value {!r}", value) assert isinstance(name, bytes) assert isinstance(value, bytes) raw_name = name name = name.lower() if name == b"content-length": lengths = {length.strip() for length in value.split(b",")} if len(lengths) != 1: raise LocalProtocolError("conflicting Content-Length headers") value = lengths.pop() validate(_content_length_re, value, "bad Content-Length") if seen_content_length is None: seen_content_length = value new_headers.append((raw_name, name, value)) elif seen_content_length != value: raise LocalProtocolError("conflicting Content-Length headers") elif name == b"transfer-encoding": # "A server that receives a request message with a transfer coding # it does not understand SHOULD respond with 501 (Not # Implemented)." # https://tools.ietf.org/html/rfc7230#section-3.3.1 if saw_transfer_encoding: if saw_transfer_encoding == value: continue raise LocalProtocolError( "multiple Transfer-Encoding headers", error_status_hint=501 ) # "All transfer-coding names are case-insensitive" # -- https://tools.ietf.org/html/rfc7230#section-4 value = value.lower() if value != b"chunked": raise LocalProtocolError( "Only Transfer-Encoding: chunked is supported", error_status_hint=501, ) saw_transfer_encoding = value new_headers.append((raw_name, name, value)) else: new_headers.append((raw_name, name, value)) return Headers(new_headers) ```
stale[bot] commented 2 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

tomchristie commented 2 years ago

Boop

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

logan-vitelity commented 1 year ago

any update on a fix?

nuno-andre commented 1 year ago

@logan-vitelity This is actually an h11 issue. It seems that they are open to loose the restriction, but for the time being, there are no changes.

At the moment the only solution is the monkeypatching.

Fnck commented 1 year ago

Just a suggestion for someone who face the same problem:

  1. install h11
  2. manually change the code of h11, there are 2 snippet to change:
    ## _headers.py line 89
            if saw_transfer_encoding:
                pass
                # raise LocalProtocolError("multiple Transfer-Encoding headers",
                #                         error_status_hint=501)
    ## _connection.py line 88
    if transfer_encodings:
        assert transfer_encodings == [b"chunked"] or b"chunked" in transfer_encodings
        return ("chunked", ())
danstoner commented 1 month ago

Multiple Transfer-Encoding headers are explicitly forbidden in specs. But browsers, and many libraries, accept them; so this is a case that we may have to deal with at some point.

RFC 7230 is superceded by RFC 9112.

I read RFC 9112 looking for the "explicitly forbidden" spec... I cannot find it. But I know that RFCs can often be vague to the point of having multiple mutually exclusive yet reasonable interpretations.

Can someone clarify where the "explicitly forbidden" language is located in the RFC?

tomchristie commented 1 month ago

@danstoner Okay, I've taken another look over the h11 code...

Given that h11 will gracefully ignore multiple Content-Length headers (so long as they have the same value)...

https://github.com/python-hyper/h11/blob/cc87dfcc5a4693eb49b0453e6677ca004ffb035b/h11/_headers.py#L176-L178

It would be a reasonable change for it to also gracefully ignore multiple Transfer-Encoding headers (so long as they have the value "chunked")

Would you consider a pull request on h11 for that change?

I'll also link to https://github.com/python-hyper/h11/issues/175 so that if anyone is minded to start on a release we can hopefully get this resolved.