Open RobertCraigie opened 1 week ago
Would you be able to review what range of characters the h11
package uses for valid HTTP headers?
(Because it's correct, and because it's what we use for the default underlying transport so we may as well be consistent at the higher level abstraction.)
Would you be able to review what range of characters the h11 package uses for valid HTTP headers? (Because it's correct, and because it's what we use for the default underlying transport so we may as well be consistent at the higher level abstraction.)
From https://github.com/python-hyper/h11/blob/master/h11/_headers.py
# Facts
# -----
#
# Headers are:
# keys: case-insensitive ascii
# values: mixture of ascii and raw bytes
#
# "Historically, HTTP has allowed field content with text in the ISO-8859-1
# charset [ISO-8859-1], supporting other charsets only through use of
# [RFC2047] encoding. In practice, most HTTP header field values use only a
# subset of the US-ASCII charset [USASCII]. Newly defined header fields SHOULD
# limit their field values to US-ASCII octets. A recipient SHOULD treat other
# octets in field content (obs-text) as opaque data."
# And it deprecates all non-ascii values
So it's essentially direct-quoting from HTTP/1.1 spec, and thus the choice of ascii
encoding makes sense.
In the main, these sorts of situations are going to happen when using authentication headers, which are often obtained via some sort of "secret management" process that includes encryption/decryption and/or base64 encoding/decoding along the way before such values get injected into actual code. This leaves the door open for upstream human errors to propagate down into this level while not being "obvious" due to the opaque nature of it all.
While the example above is very contrived using Cyrillic alphabet, the real error source was more like some bad copy/paste of the correct value.
Discussed in https://github.com/encode/httpx/discussions/3399