To implement automatic decoding in the parser, we first need to detect the encoding of the body. This task is complicated by the existence of two headers that determine the encoding: Content-Encoding and Transfer-Encoding. While both influence the decoding process, they serve different purposes. The Transfer-Encoding header, in particular, is designed for use by proxies, as it is a hop-by-hop header applied to a message between two nodes rather than to the resource itself. Consequently, each segment of a multi-node connection may use a different Transfer-Encoding value.
Here is what RFC 7230 says about Transfer-Encoding:
Transfer-Encoding is primarily intended to accurately
delimit a dynamically generated payload and to distinguish payload
encodings that are only applied for transport efficiency or security
from those that are characteristics of the selected resource.
A recipient MUST be able to parse the chunked transfer coding
(Section 4.1) because it plays a crucial role in framing messages
when the payload body size is not known in advance. A sender MUST
NOT apply chunked more than once to a message body (i.e., chunking an
already chunked message is not allowed). If any transfer coding
other than chunked is applied to a request payload body, the sender
MUST apply chunked as the final transfer coding to ensure that the
message is properly framed. If any transfer coding other than
chunked is applied to a response payload body, the sender MUST either
apply chunked as the final transfer coding or terminate the message
by closing the connection.
For example,
Transfer-Encoding: gzip, chunked
indicates that the payload body has been compressed using the gzip
coding and then chunked using the chunked coding while forming the
message body.
Unlike Content-Encoding (Section 3.1.2.1 of [RFC7231]),
Transfer-Encoding is a property of the message, not of the
representation, and any recipient along the request/response chain
MAY decode the received transfer coding(s) or apply additional
transfer coding(s) to the message body, assuming that corresponding
changes are made to the Transfer-Encoding field-value. Additional
information about the encoding parameters can be provided by other
header fields not defined by this specification.
However, searching through the internet, it seems that in practice, only chunkedTransfer-Encoding is commonly implemented by servers and client tools:
Another complicating factor is the potential for Content-Encoding to contain multiple encoding methods. These methods must be decoded in the order in which they were applied, but our current design only supports a single decoder (filter):
Content-Encoding: deflate, gzip
I couldn't find sufficient evidence to determine whether multiple encoding methods are commonly used in practice. The closest related discussion I found is : how to disable Nginx double gzip encoding.
Assuming that multiple encodings in Content-Encoding are rarely encountered, the following approach could be considered for implementation:
Make automatic decoding optional and configurable by the user. This feature can be helpful in cases where users may prefer to receive encoded data as-is, such as in a proxy application.
Disregard the possibility of any Transfer-Encoding other than chunked, though ensure it is parsed correctly.
To automatically select the appropriate decoder filter, check only the Content-Encoding header.
Provide an interface that allows users to selectively apply a decoder (or filter). This would be useful in niche scenarios, such as when interacting with a server that uses Transfer-Encoding for compression.
Currently the decision to encode or decode is a manual process delegated to the user. For now I think this is fine, as it lets us develop the rest of the code which is more complicated.
To implement automatic decoding in the parser, we first need to detect the encoding of the body. This task is complicated by the existence of two headers that determine the encoding:
Content-Encoding
andTransfer-Encoding
. While both influence the decoding process, they serve different purposes. TheTransfer-Encoding
header, in particular, is designed for use by proxies, as it is a hop-by-hop header applied to a message between two nodes rather than to the resource itself. Consequently, each segment of a multi-node connection may use a differentTransfer-Encoding
value.Here is what RFC 7230 says about
Transfer-Encoding
:However, searching through the internet, it seems that in practice, only
chunked
Transfer-Encoding
is commonly implemented by servers and client tools:Another complicating factor is the potential for
Content-Encoding
to contain multiple encoding methods. These methods must be decoded in the order in which they were applied, but our current design only supports a single decoder (filter):I couldn't find sufficient evidence to determine whether multiple encoding methods are commonly used in practice. The closest related discussion I found is : how to disable Nginx double gzip encoding.
Assuming that multiple encodings in
Content-Encoding
are rarely encountered, the following approach could be considered for implementation:Transfer-Encoding
other thanchunked
, though ensure it is parsed correctly.Content-Encoding
header.Transfer-Encoding
for compression.