aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
14.93k stars 1.99k forks source link

Received "Response payload is not completed" when reading response #4581

Closed pablogsal closed 1 week ago

pablogsal commented 4 years ago

🐞 Describe the bug

A ClientPayloadError: Response payload is not completed exception is raised when reading the response of a GET request from the GitHub REST API (this should not be especially relevant). The response seems correct and using curl succeeds all the time.

💡 To Reproduce

Do lots of requests against an endpoint of the GitHub API (it happens from time to time).

💡 Expected behavior The response is correctly parsed.

📋 Logs/tracebacks

  File "/opt/lib/python3.8/site-packages/gidgethub/aiohttp.py", line 20, in _request
    return response.status, response.headers, await response.read()
  File "/opt/lib/python3.8/site-packages/aiohttp/client_reqrep.py", line 973, in read
    self._body = await self.content.read()
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 358, in read
    block = await self.readany()
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 380, in readany
    await self._wait('readany')
  File "/opt/lib/python3.8/site-packages/aiohttp/streams.py", line 296, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Some contextual information at the time of the exception at aiohttp/streams.py", line 358, in read:

blocks =  [b'[\n  {\n    "id": 2941521,\n    "node_id": "MDU6TGFiZWwyOTQxNTIx",\n    "url": "https://REDACTED_GITHUB_URL/repos/testorg/test-debian/labels/skip-issue",\n    "name": "skip-issue",\n    "color": "000000",\n    "default": false\n  }\n]\n']

n=-1

As you can see the blocks contain the whole payload (the full JSON) but aiohttp still complains about the payload not being completed.

📋 Your version of the Python

$ python --version
Python 3.8.1

📋 Your version of the aiohttp/yarl/multidict distributions

$ python -m pip show aiohttp
Name: aiohttp
Version: 3.6.2
Summary: Async http client/server framework (asyncio)
Home-page: https://github.com/aio-libs/aiohttp
Author: Nikolay Kim
Author-email: fafhrd91@gmail.com
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: async-timeout, attrs, chardet, multidict, yarl
Required-by: raven-aiohttp, pytest-aiohttp, aiojobs
$ python -m pip show multidict
Name: multidict
Version: 4.7.3
Summary: multidict implementation
Home-page: https://github.com/aio-libs/multidict
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: .../versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires:
Required-by: yarl, aiohttp
$ python -m pip show yarl
Name: yarl
Version: 1.4.2
Summary: Yet another URL library
Home-page: https://github.com/aio-libs/yarl/
Author: Andrew Svetlov
Author-email: andrew.svetlov@gmail.com
License: Apache 2
Location: /Users/pgalindo3/.pyenv/versions/3.8-dev/envs/medusa/lib/python3.8/site-packages
Requires: idna, multidict
Required-by: aiohttp

📋 Additional context

I suspect this has to be with this comment:

https://github.com/aio-libs/aiohttp/blob/1740c99159176344b02006446b4302d0f92f9ac2/aiohttp/streams.py#L352-L356

as this is where is failing and the situation seems that is the one described in the TODO.

pablogsal commented 4 years ago

CC: @socketpair @asvetlov as the TODO was added in https://github.com/aio-libs/aiohttp/commit/5c4cb823b35fb452c67fb8070c417df5980c2fd8

webknjaz commented 4 years ago

I'm curious if it's somehow related to #4549.

WH-2099 commented 4 years ago

I'm curious if it's somehow related to #4549.

I encountered these two problems at the same time. #4549 #4581 Python: 3.8.4 aiohttp: 3.6.2

I think there do have some relaionship. And I think the core conditions are frequent visits to a single domain name.

WH-2099 commented 4 years ago

CC: @socketpair @asvetlov as the TODO was added in https://github.com/aio-libs/aiohttp/commit/5c4cb823b35fb452c67fb8070c417df5980c2fd8

Change the while to if seems no help.

tildedave commented 3 years ago

Hi there, I've received this error too. My use case is a reverse-proxy in front of gunicorn. The other reports of this instance on this repo appear to be poorly behaved HTTP servers, which I think is a separate issue.

My code streams the proxy body back to upstream, using code roughly like:

response = web.StreamingResponse(headers=response_headers, status=status)
async for chunk in gunicorn_response.content.iter_chunked(64 * 1_024):
   await response.write(chunk)
await response.write_eof()
return response

What seems to be happening is that gunicorn returns a Connection: close header and then there is a race condition between aiohttp reading the body and the connection being closed. If I get the data out of aiohttp in time, it works, but sometimes this stack trace is triggered.

Investigation:

thehesiod commented 2 years ago

we have this happening between an aiohttp 3.8.1 client + server in production, trying to reproduce

ghost commented 2 years ago

I've just experienced the same issue via s3fs, downloading a few hundred files from S3.

Python: 3.7.12 aiohttp: 3.8.1 yarl: 1.7.2 multidict: 5.2.0

Traceback (most recent call last):
  File "dataset_io.py", line 336, in download
    filesystem.get(self._location, self._download_location, recursive=True)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get
    coros, batch_size=batch_size, callback=callback
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
  File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for
    return await fut
  File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file
    chunk = await body.read(2 ** 16)
  File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read
    chunk = await self.__wrapped__.read(amt if amt is not None else -1)
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read
    await self._wait("read")
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Sorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.

a-gn commented 2 years ago

I got this error on Python 3.9 with aiohttp 3.8.1 on Windows. I haven't been able to reproduce.

My server failed around the same time as the error. The backoff library re-ran the request, and I got the ClientPayloadError. Is this expected? Shouldn't losing the server raise ConnectionError instead?

aralroca commented 2 years ago

I have the same issue with Python 3.10

joelcorporan commented 2 years ago

Same error here!

junbaibai0719 commented 1 year ago

I've just experienced the same issue via s3fs, downloading a few hundred files from S3.

Python: 3.7.12 aiohttp: 3.8.1 yarl: 1.7.2 multidict: 5.2.0

Traceback (most recent call last):
  File "dataset_io.py", line 336, in download
    filesystem.get(self._location, self._download_location, recursive=True)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 91, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 71, in sync
    raise return_result
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 25, in _runner
    result[0] = await coro
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 529, in _get
    coros, batch_size=batch_size, callback=callback
  File "/.venv/lib/python3.7/site-packages/fsspec/asyn.py", line 249, in _run_coros_in_chunks
    await asyncio.gather(*chunk, return_exceptions=return_exceptions),
  File "/root/.pyenv/versions/3.7.12/lib/python3.7/asyncio/tasks.py", line 414, in wait_for
    return await fut
  File "/.venv/lib/python3.7/site-packages/s3fs/core.py", line 1002, in _get_file
    chunk = await body.read(2 ** 16)
  File "/.venv/lib/python3.7/site-packages/aiobotocore/response.py", line 53, in read
    chunk = await self.__wrapped__.read(amt if amt is not None else -1)
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 385, in read
    await self._wait("read")
  File "/.venv/lib/python3.7/site-packages/aiohttp/streams.py", line 304, in _wait
    await waiter
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed

Sorry I don't have much more information here, other than the fact that this is occurring on Python 3.7, whereas #4549 has been described as occurring on Python 3.8 only.

I am also getting that error using iter_chunked when downloading files from S3

Mashuai1314 commented 1 year ago

Every time I download the last few files, this error message appears

gelodefaultbrain commented 1 year ago

Hi, is there any fix for this? Appreciate it.

TheOnlyWayUp commented 1 year ago

My services occasionally crash with this error, Python 3.10.12 and Name: aiohttp Version: 3.8.5.

What are our obstacles in isolating this issue, before we can fix it?

I'll ensure I hold on to the logs the next time it arises, do we know what's causing it?

Thanks!

Dreamsorcerer commented 1 year ago

What are our obstacles in isolating this issue, before we can fix it?

Not sure. Maybe we can figure out a message that causes the problem, or maybe it needs multiple steps to trigger.. It looks like it might be related to #5238, so maybe we can get that PR updated and merged.

TheOnlyWayUp commented 1 year ago

I'll try that PR out, will report if something breaks

ShantanuNair commented 12 months ago

@Dreamsorcerer @TheOnlyWayUp Did you guys seem to figure this out? May be a session setting issue wrt stale connections. I am running into this issue with langchainjs which uses openAI's python client.

Dreamsorcerer commented 12 months ago

All discussion is above, so feel free to follow up. PR #5238 could be worked on to get it merged, or if you can figure out a test (that runs in our CI) to reproduce the issue, then that would be a great help.

lmazuel commented 10 months ago

I started to have this issue with 3.8.6 on pypy 3.9.18 consistently on CI, 80% of the time on my local box for a very specific streaming unittest (so I can trigger it quite easily with a simple pytest run). It was working fine on 3.8.5.

Funny thing is that reading your ChangeLog, I saw the change for lenient parsing in llhttp and the -X dev mode that does the parsing differently, so out of ideas, I tried to run pytest with python -X dev -m pytest mytest, and suddenly the error is gone. If I remove again the -X dev, the error is back. Strangely, this works great on any other Python version.

Hope this helps somehow.

Stacktrace is consistent with everyone in this thread

../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/client_reqrep.py:1043: in read
    self._body = await self.content.read()
../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/streams.py:375: in read
    block = await self.readany()
../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/streams.py:397: in readany
    await self._wait("readany")
../../../venvpypy/lib/pypy3.9/site-packages/aiohttp/streams.py:304: in _wait
    await waiter
/usr/lib/pypy3.9/asyncio/futures.py:284: in __await__
    yield self  # This tells Task to wait for completion.
/usr/lib/pypy3.9/asyncio/tasks.py:328: in __wakeup
    future.result()
Dreamsorcerer commented 10 months ago

I tried to run pytest with python -X dev -m pytest mytest, and suddenly the error is gone. If I remove again the -X dev, the error is back.

Hmm, that's surprising. I wonder if strict parsing is causing the connection to be get closed without raising an error somehow, thus forcing it to open a new connection...

rlippmann commented 10 months ago

I can confirm this on python 3.11/aiohttp 3.8.6

If I downgrade to 3.8.5 I don't have a problem if I do:

if (method == "GET):
   await session.get(....)
if (method == "POST")
  await session.post(...)

strangely, if I just do just session.request(method...), I get the error on 3.8.5 as well.

Dreamsorcerer commented 10 months ago

The possible fix (depending on what really is the root cause) is in 3.9.0b1, so please try to reproduce with that.

rlippmann commented 10 months ago

Nope, still failing.

Dreamsorcerer commented 10 months ago

Then someone will need to do some more debugging and/or figure out a reliable test that we can use to reproduce the behaviour.

webknjaz commented 9 months ago

Another data point: this is happening to our Octomachinery-based bots (Chronographer and Patchback), which are built on top of gidgethub+aiohttp, just like in the Pablo's traceback. It's happening under CPython 3.9.13 on the quay.io/fedora/python-39:latest image with aiohttp 3.6.2. The bots are running Sentry, so it's possible to look a bit deeper into the tracebacks: https://github.com/sanitizers/octomachinery/issues/59. Upon inspecting them, I noticed that they are related to GET and PATCH requests to the GitHub APIs. Nothing really useful. Judging from the other hints in this issue, this might be reproducible in slow envs, like with high network latency or PyPy (which is often slower closer to its startup, before the JIT warms up). Back to gidgethub, it uses aiohttp.ClientSession.request()https://github.com/gidgethub/gidgethub/blob/7474cae/gidgethub/aiohttp.py#L19, which is in line with one of the comments above suggesting that calling dedicated methods like .get(), .patch() instead of .request() might yield a differing behavior.

Anyway, having Sentry in place, we could probably make some educated guesses and instrument my bots to log more things. If anybody has ideas, I'm all ears...

webknjaz commented 9 months ago

Comparing .get(), .patch() and .request(), I concluded that they are virtually equivalent and haven't changed in through versions. So your @rlippmann observation must be caused by something else. Also note that all of these methods return async context managers and should be used through async with and not await, for proper cleanup of the underlying response object.

webknjaz commented 9 months ago

So the exception is set here: https://github.com/aio-libs/aiohttp/blob/213d1b22d42ce8efc54d7858b490e920fcdb4f0a/aiohttp/client_proto.py#L99.

I wonder if one of these tests is a reproducer (when the exception processing is unsuppressed):

Dreamsorcerer commented 9 months ago

So the exception is set here:

https://github.com/aio-libs/aiohttp/blob/213d1b22d42ce8efc54d7858b490e920fcdb4f0a/aiohttp/client_proto.py#L99

That code suppresses the actual exception. I think we need to atleast log the original exception, or ideally parent the original exception (like raise ... from ...) somehow.

webknjaz commented 9 months ago

That code suppresses the actual exception. I think we need to atleast log the original exception, or ideally parent the original exception (like raise ... from ...) somehow.

Yes, I was also thinking that.

webknjaz commented 9 months ago

https://github.com/aio-libs/aiohttp/blob/213d1b22d42ce8efc54d7858b490e920fcdb4f0a/tests/test_web_sendfile_functional.py#L951-L989

When I injected a breakpoint here under the except block, and modified client_proto.py with the causing exception details, I extracted this when I ran the specific test with pytest -s:

(Pdb) pp traceback.print_exc()
Traceback (most recent call last):
  File "~/src/github/aio-libs/aiohttp/aiohttp/client_proto.py", line 95, in connection_lost
    uncompleted = self._parser.feed_eof()
                  ^^^^^^^^^^^^^^^^^^^^^^^
  File "aiohttp/_http_parser.pyx", line 510, in aiohttp._http_parser.HttpParser.feed_eof
    raise ContentLengthError(
aiohttp.http_exceptions.ContentLengthError: 400, message:
  Not enough data for satisfy content length header.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "~/src/github/aio-libs/aiohttp/tests/test_web_sendfile_functional.py", line 985, in test_static_file_huge_cancel
    data += await resp.content.read(1024)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "~/src/github/aio-libs/aiohttp/aiohttp/streams.py", line 347, in read
    raise self._exception
aiohttp.client_exceptions.ClientPayloadError: Response payload is not completed: <ContentLengthError: 400, message='Not enough data for satisfy content length header.'>
None
Dreamsorcerer commented 9 months ago

Hmm, should that exception be ignored, or should we not even be trying to feed EOF?

As far as I can see, that means the connection has been dropped before we've read the full response, and then we try to pretend that we received EOF in order to finish it off?

But, in the context of that test, this is clearly correct, expected behaviour currently:

Reading from the stream may raise aiohttp.ClientPayloadError if the response object is closed before response receives all data or in case if any transfer encoding related errors like malformed chunked encoding of broken compression data. https://docs.aiohttp.org/en/stable/client_reference.html#aiohttp.ClientResponse.content

webknjaz commented 9 months ago

No idea.. But it think, that test may be a basis for a new one. FTR this is what I did:

diff --git a/aiohttp/client_proto.py b/aiohttp/client_proto.py
index c42c9540d..cc5916a37 100644
--- a/aiohttp/client_proto.py
+++ b/aiohttp/client_proto.py
@@ -93,11 +93,13 @@ class ResponseHandler(BaseProtocol, DataQueue[Tuple[RawResponseMessage, StreamRe
         if self._parser is not None:
             try:
                 uncompleted = self._parser.feed_eof()
-            except Exception:
+            except Exception as underlying_exc:
                 if self._payload is not None:
-                    self._payload.set_exception(
-                        ClientPayloadError("Response payload is not completed")
+                    reraised_exc = ClientPayloadError(
+                        f"Response payload is not completed: {underlying_exc !r}",
                     )
+                    reraised_exc.__cause__ = underlying_exc
+                    self._payload.set_exception(reraised_exc)

         if not self.is_eof():
             if isinstance(exc, OSError):
diff --git a/tests/test_web_sendfile_functional.py b/tests/test_web_sendfile_functional.py
index 6a64f8aa2..ec0c04950 100644
--- a/tests/test_web_sendfile_functional.py
+++ b/tests/test_web_sendfile_functional.py
@@ -966,6 +966,7 @@ async def test_static_file_huge_cancel(aiohttp_client: Any, tmp_path: Any) -> No
         sock = tr.get_extra_info("socket")
         sock.setsockopt(socket.SOL_SOCKET, socket.SO_SNDBUF, 1024)
         ret = web.FileResponse(file_path)
+        # task.cancel()
         return ret

     app = web.Application()
@@ -974,15 +975,21 @@ async def test_static_file_huge_cancel(aiohttp_client: Any, tmp_path: Any) -> No
     client = await aiohttp_client(app)

     resp = await client.get("/")
-    assert resp.status == 200
+    # assert resp.status == 200
     task.cancel()
-    await asyncio.sleep(0)
+    # await asyncio.sleep(0)
+    await asyncio.sleep(10)
     data = b""
     while True:
         try:
             data += await resp.content.read(1024)
-        except aiohttp.ClientPayloadError:
+        except aiohttp.ClientPayloadError as payload_err:
+            import sys, traceback
+            traceback.print_exc()
+            traceback.print_exception(payload_err.__cause__)
+            breakpoint()
             break
+    assert resp.status == 200
     assert len(data) < 1024 * 1024 * 20

     await resp.release()
Dreamsorcerer commented 9 months ago
+        # task.cancel()

This still produces an error when the task isn't cancelled? That doesn't seem right.

But, the changes to logging should probably be put into a PR regardless.

webknjaz commented 9 months ago
+        # task.cancel()

This still produces an error when the task isn't cancelled? That doesn't seem right.

No, that's a leftover. There's still the original cancel outside the handler.

But, the changes to logging should probably be put into a PR regardless.

Totally. I haven't gotten to that. Just experimenting with stuff at the moment. Feel free to copy my patch into a PR.

webknjaz commented 9 months ago

But, the changes to logging should probably be put into a PR regardless.

A more complete PR should probably update aiohttp.helpers.set_exception() with a new cause argument (having some sentinel default, because None has a special meaning). That helper should (maybe?) check if the exception doesn't already have a cause set (not sure).

And all set_exception() invocations would need to be replaced with the helper. I don't see any calls actually setting causes across the entire project:

$ rg -B2 \\bset_exception\\b 
tests/test_connector.py
1646-        conn._create_connection = mock.Mock()
1647-        conn._create_connection.return_value = loop.create_future()
1648:        conn._create_connection.return_value.set_exception(err)

tests/test_streams.py
267-        assert b"li" == data
268-
269:        stream.set_exception(ValueError())
--
376-        assert b"line\n" == data
377-
378:        stream.set_exception(ValueError())
--
498-        assert b"line" + separator == data
499-
500:        stream.set_exception(ValueError("Another exception"))
--
572-        assert b"li" == data
573-
574:        stream.set_exception(ValueError())
--
644-
645-        exc = ValueError()
646:        stream.set_exception(exc)
--
652-
653-        async def set_err():
654:            stream.set_exception(ValueError())
--
672-        t.cancel()
673-        await asyncio.sleep(0)
674:        # The following line fails if set_exception() isn't careful.
675:        stream.set_exception(RuntimeError("message"))
--
706-        assert b"line\n" == data
707-
708:        stream.set_exception(ValueError())
--
734-        stream = self._make_one()
735-        stream.feed_data(b"line\n")
736:        stream.set_exception(ValueError())
--
1070-        stream = self._make_one()
1071-        exc = RuntimeError()
1072:        stream.set_exception(exc)
--
1101-    s = streams.EmptyStreamReader()
1102-    assert str(s) is not None
1103:    assert s.set_exception(ValueError()) is None
--
1206-        item = object()
1207-        buffer.feed_data(item)
1208:        buffer.set_exception(ValueError)
--
1215-
1216-    async def test_read_exception(self, buffer: Any) -> None:
1217:        buffer.set_exception(ValueError())
--
1223-        val = object()
1224-        buffer.feed_data(val, 1)
1225:        buffer.set_exception(ValueError())
--
1236-
1237-        buffer.feed_eof()
1238:        buffer.set_exception(ValueError())
--
1245-
1246-        exc = ValueError()
1247:        buffer.set_exception(exc)
--
1252-
1253-        async def set_err():
1254:            buffer.set_exception(ValueError())
--
1399-
1400-    exc = ValueError()
1401:    reader.set_exception(exc)
--
1417-
1418-    exc = ValueError()
1419:    reader.set_exception(exc)
--
1432-    reader.on_eof(on_eof)
1433-
1434:    reader.set_exception(ValueError())

tests/test_client_request.py
1017-    async def throw_exc():
1018-        await asyncio.sleep(0.01)
1019:        fut.set_exception(ValueError)
--
1024-    await req._writer
1025-    # assert conn.close.called
1026:    assert conn.protocol.set_exception.called
--
1042-    async def throw_exc():
1043-        await asyncio.sleep(0.01)
1044:        fut.set_exception(inner_exc)
--
1049-    await req._writer
1050-    # assert connection.close.called
1051:    assert conn.protocol.set_exception.called
1052:    outer_exc = conn.protocol.set_exception.call_args[0][0]
--
1149-    await req.write_bytes(writer, conn)
1150-
1151:    assert conn.protocol.set_exception.called
1152:    exc = conn.protocol.set_exception.call_args[0][0]

tests/test_web_functional.py
1847-        async with resp:
1848-            assert resp.status == 200
1849:            resp.content.set_exception(RuntimeError())

tests/test_client_functional.py
954-        url, headers = server.make_url("/"), {"Connection": "Keep-alive"}
955-        resp = await session.get(url, headers=headers)
956:        resp.content.set_exception(ValueError())
--
2694-        with pytest.raises(RuntimeError):
2695-            async with session.get(server.make_url("/")) as resp:
2696:                resp.content.set_exception(RuntimeError())
--
2731-    with pytest.raises(RuntimeError):
2732-        async with cm as resp:
2733:            resp.content.set_exception(RuntimeError())

tests/test_web_websocket.py
313-    exc = EofStream()
314-    res = loop.create_future()
315:    res.set_exception(exc)
--
331-    ws._reader = mock.Mock()
332-    res = loop.create_future()
333:    res.set_exception(asyncio.TimeoutError())

tests/test_client_ws.py
317-
318-                exc = EofStream()
319:                resp._reader.set_exception(exc)
--
349-
350-                exc = ValueError()
351:                resp._reader.set_exception(exc)
--
467-
468-                exc = ValueError()
469:                resp._reader.set_exception(exc)

tests/test_worker.py
248-        worker.alive = False
249-        assert waiter is not None
250:        waiter.set_exception(RuntimeError())

tests/test_client_response.py
55-    connection.protocol = aiohttp.DataQueue(loop)
56-    connection.protocol.set_response_params = mock.Mock()
57:    connection.protocol.set_exception(http.HttpProcessingError())
--
234-    content = response.content = mock.Mock()
235-    content.read.return_value = loop.create_future()
236:    content.read.return_value.set_exception(ValueError)

tests/test_helpers.py
697-
698-
699:# ------------- set_result / set_exception ----------------------
--
717-async def test_set_exception(loop) -> None:
718-    fut = loop.create_future()
719:    helpers.set_exception(fut, RuntimeError())
--
725-    fut = loop.create_future()
726-    fut.cancel()
727:    helpers.set_exception(fut, RuntimeError())

aiohttp/base_protocol.py
77-            waiter.set_result(None)
78-        else:
79:            waiter.set_exception(exc)

aiohttp/streams.py
15-
16-from .base_protocol import BaseProtocol
17:from .helpers import BaseTimerContext, TimerNoop, set_exception, set_result
--
147-        return self._exception
148-
149:    def set_exception(self, exc: BaseException) -> None:
--
154-        if waiter is not None:
155-            self._waiter = None
156:            set_exception(waiter, exc)
--
159-        if waiter is not None:
160-            self._eof_waiter = None
161:            set_exception(waiter, exc)
--
500-        return None
501-
502:    def set_exception(self, exc: BaseException) -> None:
--
575-        return self._exception
576-
577:    def set_exception(self, exc: BaseException) -> None:
--
582-        if waiter is not None:
583-            self._waiter = None
584:            set_exception(waiter, exc)

aiohttp/http_websocket.py
306-        except Exception as exc:
307-            self._exc = exc
308:            self.queue.set_exception(exc)

aiohttp/client_proto.py
12-from .helpers import (
13-    BaseTimerContext,
14:    set_exception,
--
82-
83-        if exc is not None:
84:            set_exception(self.closed, exc)
--
96-            except Exception:
97-                if self._payload is not None:
98:                    self._payload.set_exception(
--
107-            # assigns self._should_close to True as side effect,
108-            # we do it anyway below
109:            self.set_exception(exc)
--
129-        self._reschedule_timeout()
130-
131:    def set_exception(self, exc: BaseException) -> None:
132-        self._should_close = True
133-        self._drop_timeout()
134:        super().set_exception(exc)
--
207-    def _on_read_timeout(self) -> None:
208-        exc = ServerTimeoutError("Timeout on reading data from socket")
209:        self.set_exception(exc)
210-        if self._payload is not None:
211:            self._payload.set_exception(exc)
--
242-                        self.transport.close()
243-                    # should_close is True after the call
244:                    self.set_exception(exc)

aiohttp/client_reqrep.py
570-        except OSError as exc:
571-            if exc.errno is None and isinstance(exc, asyncio.TimeoutError):
572:                protocol.set_exception(exc)
--
577-                new_exc.__context__ = exc
578-                new_exc.__cause__ = exc
579:                protocol.set_exception(new_exc)
--
581-            await writer.write_eof()
582-        except Exception as exc:
583:            protocol.set_exception(exc)
--
1021-        # content can be None here, but the types are cheated elsewhere.
1022-        if content and content.exception() is None:  # type: ignore[truthy-bool]
1023:            content.set_exception(ClientConnectionError("Connection closed"))

aiohttp/http_parser.py
438-                except BaseException as exc:
439-                    if self.payload_exception is not None:
440:                        self._payload_parser.payload.set_exception(
--
442-                        )
443-                    else:
444:                        self._payload_parser.payload.set_exception(exc)
--
823-                                chunk[:pos].decode("ascii", "surrogateescape")
824-                            )
825:                            self.payload.set_exception(exc)
--
926-            self.decompressor = ZLibDecompressor(encoding=encoding)
927-
928:    def set_exception(self, exc: BaseException) -> None:
929:        self.out.set_exception(exc)

aiohttp/web_ws.py
536-    def _cancel(self, exc: BaseException) -> None:
537-        if self._reader is not None:
538:            self._reader.set_exception(exc)

aiohttp/_http_parser.pyx
766-    except BaseException as exc:
767-        if pyparser._payload_exception is not None:
768:            pyparser._payload.set_exception(pyparser._payload_exception(str(exc)))
769-        else:
770:            pyparser._payload.set_exception(exc)

aiohttp/web_request.py
814-
815-    def _cancel(self, exc: BaseException) -> None:
816:        self._payload.set_exception(exc)

aiohttp/web_protocol.py
578-                        self.close()
579-
580:                payload.set_exception(PayloadAccessError())

aiohttp/helpers.py
798-
799-
800:def set_exception(fut: "asyncio.Future[_T]", exc: BaseException) -> None:
801-    if not fut.done():
802:        fut.set_exception(exc)
xaralis commented 7 months ago

I was running into this rather frequently when downloading larger files, like PDFs. I'm using iter_chunked() with custom decompression logic. I was able to mitigate it to a point it's no more troublesome by increasing the chunk size from 1 kB to 1 MB. After this change, the issue disappeared almost completely.

I suggest trying this out until a real fix is released.

Also, I suggest adding some recommendation in docs on how to select proper chunk size as there really is no guidance beyond the example code.

In my case, I was not only able to get rid of the issue but it made my requests like 10 times faster for larger files. I guess the example 1024 is too small even for common HTML documents.

webknjaz commented 7 months ago

I made #8089 (extension/follow-up to #8049) to try make errors set on the futures better traceable.

jakewins commented 7 months ago

I can reliably reproduce at least one case of this. In case it's helpful for anyone debugging this, here's some details so far.

If I modify connection_lost in client_proto.py to include the full exception like so:

        if self._parser is not None:
            try:
                uncompleted = self._parser.feed_eof()
            except Exception as e:
                if self._payload is not None:
                    self._payload.set_exception(
                        e#ClientPayloadError("Response payload is not completed")
                    )

Then I get the following exception in the app code:

    |   File "<redacted>/python3.11/site-packages/aiohttp/client_proto.py", line 82, in connection_lost
    |     uncompleted = self._parser.feed_eof()
    |                   ^^^^^^^^^^^^^^^^^^^^^^^
    |   File "aiohttp/_http_parser.pyx", line 510, in aiohttp._http_parser.HttpParser.feed_eof
    | aiohttp.http_exceptions.ContentLengthError: 400, message:
    |   Not enough data for satisfy content length header.

Some additional bits of info:

jakewins commented 7 months ago

Update: I think I was using the API incorrectly? I didn't realize ClientSession.request returned a context manager, I just used regular await on it. My original - failing with the issue in this ticket - code looked like so:

async def example(http: aiohttp.ClientSession, headers, data):
    res = await http.request("POST", "http://example.com",
                        headers=headers, data=data)
    body = await res.read()
    return body

changing it to use the context manager seems to have resolved the issue for me:

async def example(http: aiohttp.ClientSession, headers, data):
    async with http.request("POST", "http://example.com",
                        headers=headers, data=data) as res:
        body = await res.read()
        return body
Dreamsorcerer commented 7 months ago

I am pretty sure both versions of that code should do the same thing... Would be great if you could figure out a full test in a PR to reproduce the issue. Note we've also got a PR this week which will add the underlying exception information in future versions of aiohttp, so you won't need to modify the code in future to figure that out.

webknjaz commented 7 months ago

I am pretty sure both versions of that code should do the same thing...

I think, closing the response might perform the closing network exchange more predictably. Without it, perhaps a timeout happens before the io loop gets to that.

webknjaz commented 7 months ago

@jakewins if you want to get more details before my PR is merged and released, you could apply https://github.com/aio-libs/aiohttp/pull/8089.diff which extends what's revealed in tracebacks.

Dreamsorcerer commented 7 months ago

I am pretty sure both versions of that code should do the same thing...

I think, closing the response might perform the closing network exchange more predictably. Without it, perhaps a timeout happens before the io loop gets to that.

I've been involved with that code a fair bit last year, in both cases the connection should be closed upon returning from the .read() call.

jakewins commented 7 months ago

@Dreamsorcerer aye; looks like you're right, changing the prod code to use the context manager did not make the error go away, though it seems to have reduced the rate a lot, maybe 80% or so, but it's still happening.

The repro I have depends on running one of our services locally and letting it make this specific call asking for a large payload into our infrastructure via VPN, so I can't share it as-is, but I will try next week and create a free-standing repro

webknjaz commented 6 months ago

Hey everyone, I just released v3.9.4rc0. It'll hopefully help us get more complete tracebacks and that in turn, will let us know the underlying causes for this exception being raised — there may be several distinct behaviors underneath and we need to discover what they are.

If you have ability to do so, please try out this pre-release in environments where you can reproduce your problems and come back with the logs with complete tracebacks. Let's figure this out together!

@jakewins @xaralis @rlippmann @lmazuel @ShantanuNair @TheOnlyWayUp @gelodefaultbrain @junbaibai0719 @joelcorporan @aralroca @a-gn @tharradine @tildedave @WH-2099 @pablogsal @hauntsaninja

se7enXF commented 6 months ago

Meet the same problem. I use aiohttp to request sub URL in flask async view function. About 10% requests in interface pressure testing raise this issue. There is no error if testing one by one.

webknjaz commented 6 months ago

@se7enXF what does the recent release candidate reveal in terms of the cause? Without a reproducer and logs/tracebacks, your message is not very useful.

webknjaz commented 6 months ago

Interestingly, ever since I deployed the pre-release, I stopped seeing this problem where it was happening :exploding_head:

This means we'll have to rely on people here to try to reproduce it in their environments. Just remember to try out v3.9.4rc0 to get useful information about the causal chain.

se7enXF commented 6 months ago

@se7enXF what does the recent release candidate reveal in terms of the cause? Without a reproducer and logs/tracebacks, your message is not very useful.

My project is using python3.7, which can not adapt to 3.9.4rc0. Is 4.0.0a0 or 4.0.0a1 fix this issue? I will have a try.

Dreamsorcerer commented 6 months ago

Python 3.7 itself is not supported and may have known security issues. We won't put any effort into supporting a version of Python which is not itself supported in any way, so you'll need to upgrade to a more recent version to test anything or get any support from us.