aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
14.97k stars 2k forks source link

Client support for compressed transfer encoding #4435

Open JustAnotherArchivist opened 4 years ago

JustAnotherArchivist commented 4 years ago

Long story short

aiohttp does not support any transfer encoding other than chunking. Responses from servers using e.g. Transfer-Encoding: gzip result in the compressed payload. Worse yet, those applying both compression and chunking result in the raw payload with chunking still intact (but only when using the C parser!).

Expected behaviour

Decompressed payload.

Actual behaviour

Compressed or compressed + chunked payload.

Steps to reproduce

A simple server and client illustrating this issue can be found in this gist. I wrote a single test client and server for both this issue and #4436; the first line is irrelevant to this issue and not included in the output below.

Expected output:

> python3 client.py | tail -n+2
b'Test'
b'Test'

Actual output:

> python3 client.py | tail -n+2
b'\x1f\x8b\x08\x00\x82Y\xf0]\x02\xff\x0bI-.\x01\x002\xd1Mx\x04\x00\x00\x00'
b'5\r\n\x1f\x8b\x08\x00\x82\r\n13\r\nY\xf0]\x02\xff\x0bI-.\x01\x002\xd1Mx\x04\x00\x00\x00\r\n0\r\n\r\n'

However, when running it with the pure-Python parser, the chunked TE gets handled in the second case, and the output is:

> AIOHTTP_NO_EXTENSIONS=1 python3 client.py | tail -n+2
b'\x1f\x8b\x08\x00|\\\xf0]\x02\xff\x0bI-.\x01\x002\xd1Mx\x04\x00\x00\x00'
b'\x1f\x8b\x08\x00|\\\xf0]\x02\xff\x0bI-.\x01\x002\xd1Mx\x04\x00\x00\x00'

Your environment

I tested this with aiohttp 2.3.10 and Python 3.6.9 on Debian, but based on the current aiohttp code, the behaviour should still be the same on the current versions.

JustAnotherArchivist commented 4 years ago

I should add that a number of other tools do not support transfer encodings other than chunked. If this is a conscious decision for aiohttp, that is okay, but it should be documented. For <other>, chunked, the chunking should still be handled by aiohttp in my opinion, and the two implementations (C/Python) should agree with each other.

Sidenote: the docs talk about "The gzip and deflate transfer-encodings are automatically decoded for you." and similar, but this is about Content-Encoding, not Transfer-Encoding!

asvetlov commented 4 years ago

Would you fix this?

0xicl33n commented 4 years ago

My team and i are having a similar problem where we are unable to send bytes using aiohttp but we can do it via requests. This issue might be the cause of our problem as well, eg(pseudocode) we have

head = {

    'Accept-Encoding': 'gzip'
}

with aiohttp.ClientSession() as session: 
    await session.post('example.com', headers=head, data={'some_key':b'some bytes'})

and it results with an invalid request

{'error': 'invalid_request', 'error_description': 'The provided some_key is invalid'}
asvetlov commented 4 years ago

@0xicl33n Accept-Encoding: gzip tells that the server can return a gzipped answer. The header is not related to request body encoding. I guess your issue is not related; most likely you've missed what data you send and what data format the server expects.

JustAnotherArchivist commented 4 years ago

@asvetlov Well, the question is how this should actually be fixed. I think it is desirable to support compressed transfer encoding, even though it's not used very frequently in practice and not even supported by wget, requests, and h11 as far as I can see. (curl supports it.)

If we do add support, the next question is how to do this with the C parser. Transfer encoding is handled entirely inside http-parser, and I'm not sure it'd be a good idea to move this to the Python layer for performance reasons. So the support for compressed TE would also/first have to happen upstream.

As briefly noted in my comment above, there is also another issue here involving the Content-Encoding header, and I've now filed this separately as #4462.