aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
14.93k stars 1.99k forks source link

Iterate over response data (streaming) without decoding content #8477

Closed nagylzs closed 1 month ago

nagylzs commented 2 months ago

Is your feature request related to a problem?

I need to be able to forward a GET request to another server in an async way. I'm using aiohttp for this. (The server handler is tornadoweb, but that is not important detail.) From the request handler, I can do something like this:

DEFAULT_COPY_REQUEST_HEADERS = [
    "Accept", "Accept-Encoding", "Accept-Language",
    "If-None-Match", "If-Modified-Since",
    "Range", "If-Range",
]

DEFAULT_COPY_HEADERS = [
    "Accept-Ranges",
    "Content-Type",
    "Content-Length",
    "Content-Encoding",
    "Date",
    "Etag",
    "Last-Modified"
]

# This is inside a request handler method...
async with aiohttp.ClientSession() as session:
            request_headers = {}
            for name in (DEFAULT_COPY_REQUEST_HEADERS):
                if name in self.request.headers:
                    request_headers[name] = self.request.headers[name]
            async with session.get(remote_url, headers=request_headers) as response:
                if response.status < 200:
                    self.set_status(500)
                    self.set_header("Content-Type", "text/plain")
                    self.write(f"Invalid backend status {response.status}")
                    return

                if response.status >= 400:
                    self.set_status(response.status)
                    self.set_header("Content-Type", "text/plain")
                    self.write(f"Error, status={response.status}")
                    return

                if response.status not in [200, 204, 206, 304]:
                    self.set_status(500)
                    self.set_header("Content-Type", "text/plain")
                    self.write(f"Invalid backend status {response.status}")
                    return

                self.set_status(response.status)

                for name in (DEFAULT_COPY_HEADERS):
                    if name in response.headers:
                        self.set_header(name, response.headers[name])
                async for chunk in response.content.iter_chunked(chunk_size):
                    self.write(chunk)

It works almost perfectly, except for one case. If the original request had Accept-Encoding gzip, and the remote server supports that, then the remote server returns gzipped data. However, iter_chunked returns chunks of data that are already unzipped. This results in sending back uncompressed data with wrong Content-Length and wrong Content-Encoding headers, and the request fails with an exception telling that the request handler was trying to write more data than Content-Length.

Of course, re-encoding the data could work, but then the Content-Length header must be deleted, and it would be a waste of resources.

Describe the solution you'd like

I believe it would be very good to introduce a new streaming method something like iter_raw_chunked or similar. I believe that this is a real-world scenario, because it allows aiohttp to be used for forwarding requests easily.

Describe alternatives you've considered

Use a different library that supports this? But I could not find any yet.

Related component

Client

Additional context

No response

Code of Conduct

steverep commented 2 months ago

I think all you need to do is add auto_decompress=false to the session options?

steverep commented 1 month ago

Per my last comment, I believe aiohttp already has the feature you need, so I'm closing this. If auto_decompress=False doesn't do the trick, please reply with more information and we can reopen. Thanks.