Open ameily opened 2 years ago
Use of gen-delim and sub-delim characters in this manner is compliant and intended use within the scope of https://www.rfc-editor.org/rfc/rfc3986 -- specifically in the path components and query component.
Downstream RFCs should not restrict use to less semantic formatting. Specifically, the square brackets above are indeed not part of the data being sent but rather structure of the data -- thus appropriate for delimiting if URI dereferencing is able to accept it unencoded. These characters are reserved for URI dereferencing algorithms. If Envoy, frameworks, APIs and URI schemas did not utilize them, there would be no purpose in reserving their use as gen-delim and sub-delims.
RFC 3986 was created largely with the goal of clarifying this, as the previous RFC it deprecated caused confusion by classifying such characters "unsafe" rather than reserving them for dereferencing algorithms. There is still a common misconception about this, whereas there is no longer an "unsafe" character class for the very reason of specifying character roles as delimiters for URI dereferencing algorithms.
Especially in the case of routing, rejecting unencoded delimiters is a serious anti-feature. Any RFC which has missed the purpose of delimiter classification should be pressed to change.
@SuitespaceDev Thank you for the info and context, that is very helpful!
If I'm understanding your comment and the RFC correctly: characters in the reserved
set are allowed unencoded if they do not have any semantic meaning for the URI component that they are appearing in. For example, the :
character has semantic meaning when its present in the host
component of the URI but no semantic meaning when present in the path
component. So, :
must be encoded in the host
component but can be unencoded in the path
component.
If that is accurate then this issue will be to update the UHV URI validation to conditionally allow reserved
characters based on which URI component is being validated.
Yes; that is accurate. The path component reserves only the ; (semicolon) for separating general metadata into sub-paths. For example, in mailto: URIs, the @ character is actually in the path component; somewhat unintuitively. When not specifically reserved for a use in other URI components by RFC 3986; gen-delim and sub-delim are reserved for the developers interpreting the URIs.
Here's a chunk from section 2.2, especially notice "Thus, characters in the reserved set are protected from normalization and are therefore safe to be used by scheme-specific and producer-specific algorithms for delimiting data subcomponents within a URI."
The purpose of reserved characters is to provide a set of delimiting characters that are distinguishable from other data within a URI. URIs that differ in the replacement of a reserved character with its corresponding percent-encoded octet are not equivalent. Percent- encoding a reserved character, or decoding a percent-encoded octet that corresponds to a reserved character, will change how the URI is interpreted by most applications. Thus, characters in the reserved set are protected from normalization and are therefore safe to be used by scheme-specific and producer-specific algorithms for delimiting data subcomponents within a URI.
A subset of the reserved characters (gen-delims) is used as delimiters of the generic URI components described in Section 3 https://www.rfc-editor.org/rfc/rfc3986#section-3. A component's ABNF syntax rule will not use the reserved or gen-delims rule names directly; instead, each syntax rule lists the characters allowed within that component (i.e., not delimiting it), and any of those characters that are also in the reserved set are "reserved" for use as subcomponent delimiters within the component. Only the most common subcomponents are defined by this specification; other subcomponents may be defined by a URI scheme's specification, or by the implementation-specific syntax of a URI's dereferencing algorithm, provided that such subcomponents are delimited by characters in the reserved set allowed within that component.
That second paragraph is basically saying that each component of the URI will claim its own characters, and not forbid them globally.
On Fri, Oct 28, 2022 at 10:06 AM Adam Meily @.***> wrote:
@SuitespaceDev https://github.com/SuitespaceDev Thank you for the info and context, that is very helpful!
If I'm understanding your comment and the RFC correctly: characters in the reserved set are allowed unencoded if they do not have any semantic meaning for the URI component that they are appearing in. For example, the : character has semantic meaning when its present in the host component of the URI but no semantic meaning when present in the path component. So, : must be encoded in the host component but can be unencoded in the path component.
If that is accurate then this issue will be to update the UHV URI validation to conditionally allow reserved characters based on which URI component is being validated.
— Reply to this email directly, view it on GitHub https://github.com/envoyproxy/envoy/issues/23291#issuecomment-1295112809, or unsubscribe https://github.com/notifications/unsubscribe-auth/AUDHGOFHRABA6YTVRTIOCT3WFPTWVANCNFSM6AAAAAAQYBIWDY . You are receiving this because you were mentioned.Message ID: @.***>
This issue has been automatically marked as stale because it has not had activity in the last 30 days. It will be closed in the next 7 days unless it is tagged "help wanted" or "no stalebot" or other activity occurs. Thank you for your contributions.
This issue has been automatically closed because it has not had activity in the last 37 days. If this issue is still valid, please ping a maintainer and ask them to label it as "help wanted" or "no stalebot". Thank you for your contributions.
By default UHV now allows [] in URL, which resolves the immediate issues. However in non-default strict mode []
are prohibited and will cause some gRPC transcoder tests to fail. It is not yet clear that the strict mode will be enabled as it is now, as the RFC 3986 does not say anything about [] being illegal or legal.
Title: uhv: support JSON-encoded URLs for gRPC requests
Description: gRPC may JSON encoded data within the URL query parameter. UHV follows RFC guidance for URL and path validation, which would reject valid JSON encoded query parameters. For example:
/shelf?shelf.search%5Bdecoded%5D=Google
/shelf?shelf.search[decoded]=Google
See test
GrpcJsonTranscoderIntegrationTest / QueryParamsDecodedName
for this example.Ideally UHV supports and correctly validates these gRPC style requests.