Description:
While working on the unified header validation component (#20261), we found that the Host and Authority headers do not decode percent-encoded UTF8 characters, per the RFC spec.
Although the fix could be targeted for UHV, I wanted to register this issue with the community to get consensus on how percent-encoded characters should be handled within the H1 Host and H2 :authority headers. For now, we are only looking at the Host and :authority headers and not talking about URI or path normalization.
Some initial options after reading the RFCs, which could be implemented as new configuration settings:
Keep the current behavior and verify that Envoy users can register services that match on percent-encoded host/authority.
Decode all percent-encoded characters from Host and :authority, verify they are valid UTF8 codepoints, and re-encode them in the upstream request (where appropriate).
The URI RFC says that clients producing URIs should only encode non-ASCII characters in this way. Envoy could enforce this by also verifying that the decoded UTF8 codepoint is outside the ASCII range.
This could also be done on a per-service configuration basis (e.g.- decode_authority = [true|false]
The reg-name syntax allows percent-encoded octets in order to represent non-ASCII registered names in a uniform way that is independent of the underlying name resolution technology. Non-ASCII characters must first be encoded according to UTF-8 (STD 63), and then each octet of the corresponding UTF-8 sequence must be percent-encoded to be represented as URI characters.
URI producing applications must not use percent-encoding in host unless it is used to represent a UTF-8 character sequence.
The authority component within the URI is used by both H1 Host header and H2 :authority header:
A client MUST send a Host header field in all HTTP/1.1 request messages. If the target URI includes an authority component, then a client MUST send a field-value for Host that is identical to that authority component, excluding any userinfo subcomponent and its @ delimiter.
The :authority pseudo-header field includes the authority portion of the target URI (RFC 3986, Section 3.2). The authority MUST NOT include the deprecated userinfo subcomponent for http or https schemed URIs.
Title: Host and Authority Headers RFC Compliance: Decode Percent-encoded UTF8 Characters
Description: While working on the unified header validation component (#20261), we found that the
Host
andAuthority
headers do not decode percent-encoded UTF8 characters, per the RFC spec.Although the fix could be targeted for UHV, I wanted to register this issue with the community to get consensus on how percent-encoded characters should be handled within the H1
Host
and H2:authority
headers. For now, we are only looking at theHost
and:authority
headers and not talking about URI or path normalization.Some initial options after reading the RFCs, which could be implemented as new configuration settings:
Host
and:authority
, verify they are valid UTF8 codepoints, and re-encode them in the upstream request (where appropriate).decode_authority = [true|false]
Relevant Links:
authority
component within the URI is used by both H1Host
header and H2:authority
header:Host
: