aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
15.18k stars 2.02k forks source link

Exception with aiohttp Using HTTP Proxy in some cases #8472

Open BreathBlush opened 5 months ago

BreathBlush commented 5 months ago

Describe the bug

When using aiohttp with an HTTP proxy, the following exception occurs in some cases:

1_exception

After consulting the developer of the proxy tool Gost, it was identified that both the server-side and client-side are not adhering to RFC-9110, which causes this issue.

According to the RFC-9110:

A server MUST NOT send any Transfer-Encoding or Content-Length header fields in a 2xx (Successful) response to CONNECT. A client MUST ignore any Content-Length or Transfer-Encoding header fields received in a successful response to CONNECT.

The error occurs because the Gost HTTP server returns a Content-Length header, and aiohttp does not ignore it, contrary to the RFC. This makes it difficult to identify the root cause of the issue (especially since the Requests library works well with the same proxy server setup).

Now, the Gost program has been fixed, and aiohttp works well with it. However, addressing this issue in aiohttp would help avoid similar problems for other users in the future. If aiohttp could ignore the related HTTP headers as specified in the RFC, it would greatly improve its robustness. Even if the server sends the Content-Length header, no error would occur (Just like third-libs lke requests and httpx ...etc).

To Reproduce

  1. setup a plain http proxy, using tools like: gost, goproxy
  2. testing the proxy server to ensure it works normally
  3. using aiohttp make request behind that proxy server
import asyncio
import aiohttp

env = 'http://192.168.2.100:10005'

async def main():
    async with aiohttp.ClientSession() as cs:
        async with cs.get("https://www.python.org", proxy=env) as r:
            res = await r.text()
            print(res[:100])

asyncio.run(main())
  1. the exception raised

Expected behavior

It should return the partial text of the target web.

Logs/tracebacks

Traceback (most recent call last):
  File "/home/max/remote_projects/test-pro/test_aiohttp.py", line 12, in <module>
    asyncio.run(main())
  File "/usr/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
           ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.11/asyncio/base_events.py", line 653, in run_until_complete
    return future.result()
           ^^^^^^^^^^^^^^^
  File "/home/max/remote_projects/test-pro/test_aiohttp.py", line 8, in main
    async with cs.get("https://www.python.org", proxy=env) as r:
  File "/home/max/remote_projects/test-pro/.venv/lib/python3.11/site-packages/aiohttp/client.py", line 1197, in __aenter__
    self._resp = await self._coro
                 ^^^^^^^^^^^^^^^^
  File "/home/max/remote_projects/test-pro/.venv/lib/python3.11/site-packages/aiohttp/client.py", line 581, in _request
    conn = await self._connector.connect(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/max/remote_projects/test-pro/.venv/lib/python3.11/site-packages/aiohttp/connector.py", line 544, in connect
    proto = await self._create_connection(req, traces, timeout)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/max/remote_projects/test-pro/.venv/lib/python3.11/site-packages/aiohttp/connector.py", line 942, in _create_connection
    _, proto = await self._create_proxy_connection(req, traces, timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/max/remote_projects/test-pro/.venv/lib/python3.11/site-packages/aiohttp/connector.py", line 1379, in _create_proxy_connection
    return await self._start_tls_connection(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/max/remote_projects/test-pro/.venv/lib/python3.11/site-packages/aiohttp/connector.py", line 1172, in _start_tls_connection
    raise client_error(req.connection_key, OSError(msg))
aiohttp.client_exceptions.ClientConnectorError: Cannot connect to host www.python.org:443 ssl:default [None]

Python Version

$ python --version
Python 3.12.4 on Windows
Python 3.11.2 on Debian Bookworm

aiohttp Version

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.9.5
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author:
Author-email:
License: Apache 2
Location: C:\Python\312\Lib\site-packages
Requires: aiosignal, attrs, frozenlist, multidict, yarl
Required-by:

same version both on Windows and Debian

multidict Version

$ python -m pip show multidict
Name: multidict
Version: 6.0.5
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: C:\Python\312\Lib\site-packages
Requires:
Required-by: aiohttp, yarl

same version both on Windows and Debian

yarl Version

$ python -m pip show yarl
Name: yarl
Version: 1.9.4
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache-2.0
Location: C:\Python\312\Lib\site-packages
Requires: idna, multidict
Required-by: aiohttp

same version both on Windows and Debian

OS

Windows 10 2021 LTSC Debian bookworm

Related component

Client

Additional context

No response

Code of Conduct

Dreamsorcerer commented 5 months ago

It would be great if you can add a test reproducing this in: https://github.com/aio-libs/aiohttp/blob/master/tests/test_proxy.py or https://github.com/aio-libs/aiohttp/blob/master/tests/test_proxy_functional.py