await content.read() reads partial content under high load

yapith-suplari commented 5 years ago

Long story short

Under heavy network load + high latency, aiohttp client's await read()` is sometimes truncated.

Expected behaviour

await read() always returns full content in the buffer

Actual behaviour

The returned content is sometimes truncated

Steps to reproduce

I haven't been able to successfully repro the intermittent issue in a test-bed yet. I'll update the ticket once I have a reliable repro. I know it's not actionable without it, but I create the ticket anyway to see if you have any inkling/intuition of what might be going on.

I laid out a few leads that I have not followed up thoroughly and will update the post when I discover more clues.

Analysis

I wonder if, under heavy load, there's time window where the StreamReader's internal buffer is empty that it terminates prematurely.

I'm dealing with ~100 concurrent downloads, about ~1 - 3 MB content-length per download that I push to disk or S3. When writing to SSD locally it almost never fails, it fails more often when I swapped the SSD to S3. It looks like higher latency increases the likelihood of it happening.

When I reduce the concurrent payload to <10 downloads, the problem goes away. I used asyncio.gather(*futures) to set up the download. This points to the number of concurrent coroutines affect the likelihood of the issue.

The issue is more likely when dealing with payload > 2.5 MB. This may suggest there could be a flaw in the buffer resizing in StreamReader.

I'm running without response content-length, so await read() supposed to read until EOF.

I'm running with TCPConnector:

session = aiohttp.ClientSession(timeout=aiohttp.ClientTimeout(total=3600),
                                        connector=aiohttp.TCPConnector(keepalive_timeout=600))

I'm running with a single main event loop.

fwiw, setting a breakpoint in the buffering step reliably fixes the problem, so it could be memory barrier related? I'm still hypothesizing.

Your environment

OS: ubuntu 18.04 python: 3.7.2 aiohttp: 3.5.4

aiohttp client only.

I'm running with pipenv (both dockerized and local execution)

Plenty of RAM/CPU/network bandwidth -- I swapped aiohttp with requests, the load takes ~30 mins slower staging all the payloads in mem, but they complete without the content corruption.

Let me know if you have any questions!

Potential Solution

I've also noticed aiohttp version of StreamReader doesn't have readuntil(separator=b'\n') as introduced in Streams package since 3.5.2, this would be handy for me dealing with data-pull from web-services that don't return reliable Content-Length. -- this is the follow-up of the hypothesis that under heavy load, it's possible that read() can find its internal buffer being empty and terminate prematurely.

The RFE ticket for readuntil was closed because nobody needed it: https://github.com/aio-libs/aiohttp/issues/1151, I wonder if this is enough justification for us to re-open.

alexforster commented 5 years ago

I'm seeing this as well. I'm dealing with a ~20mbit/sec data stream, and if I don't read quickly enough, the library appears to just silently drop chunks of payload.

Dreamsorcerer commented 1 month ago

This needs a reproducer for us to test with. The issue might already be fixed (if it even was an issue with aiohttp).

aio-libs / aiohttp