encode / httpx

A next generation HTTP client for Python. 🦋
https://www.python-httpx.org/
BSD 3-Clause "New" or "Revised" License
13.33k stars 849 forks source link

Mention header name error message for invalid header encodings #3400

Open RobertCraigie opened 1 week ago

RobertCraigie commented 1 week ago

Discussed in https://github.com/encode/httpx/discussions/3399

Originally posted by **RobertCraigie** November 12, 2024 This [openai-python user](https://github.com/openai/openai-python/issues/1793) ran into a confusing error when passing a non-ascii header value, would it be possible to mention the header name in the error message? Minimal repro ```py import httpx httpx.Headers({"auth": "здравейздравейздравейздравей"}) ``` ``` Traceback (most recent call last): File "script.py", line 3, in httpx.Headers({"auth": "здравейздравейздравейздравей"}) File ".venv/lib/python3.9/site-packages/httpx/_models.py", line 74, in __init__ self._list = [ File ".venv/lib/python3.9/site-packages/httpx/_models.py", line 78, in normalize_header_value(v, encoding), File ".venv/lib/python3.9/site-packages/httpx/_utils.py", line 53, in normalize_header_value return value.encode(encoding or "ascii") UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-27: ordinal not in range(128) ```
tomchristie commented 1 week ago

Would you be able to review what range of characters the h11 package uses for valid HTTP headers? (Because it's correct, and because it's what we use for the default underlying transport so we may as well be consistent at the higher level abstraction.)

jasonkaedingrhino commented 17 hours ago

Would you be able to review what range of characters the h11 package uses for valid HTTP headers? (Because it's correct, and because it's what we use for the default underlying transport so we may as well be consistent at the higher level abstraction.)

From https://github.com/python-hyper/h11/blob/master/h11/_headers.py

# Facts
# -----
#
# Headers are:
#   keys: case-insensitive ascii
#   values: mixture of ascii and raw bytes
#
# "Historically, HTTP has allowed field content with text in the ISO-8859-1
# charset [ISO-8859-1], supporting other charsets only through use of
# [RFC2047] encoding.  In practice, most HTTP header field values use only a
# subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD
# limit their field values to US-ASCII octets.  A recipient SHOULD treat other
# octets in field content (obs-text) as opaque data."
# And it deprecates all non-ascii values

So it's essentially direct-quoting from HTTP/1.1 spec, and thus the choice of ascii encoding makes sense.

jasonkaedingrhino commented 17 hours ago

In the main, these sorts of situations are going to happen when using authentication headers, which are often obtained via some sort of "secret management" process that includes encryption/decryption and/or base64 encoding/decoding along the way before such values get injected into actual code. This leaves the door open for upstream human errors to propagate down into this level while not being "obvious" due to the opaque nature of it all.

While the example above is very contrived using Cyrillic alphabet, the real error source was more like some bad copy/paste of the correct value.