aio-libs / aiohttp

Asynchronous HTTP client/server framework for asyncio and Python
https://docs.aiohttp.org
Other
15.1k stars 2.01k forks source link

aiohttp - concurrent requests are getting hung #3698

Closed raees-khan closed 2 months ago

raees-khan commented 5 years ago

Concurrent requests are getting hung. Here's a sample code that I am using to test concurrent requests.

import aiohttp
import asyncio

async def fetch(session, url):
    async with session.get(url) as response:
        print(await response.text())

async def main(n):
    url = "https://httpstat.us/200"
    async with aiohttp.ClientSession() as session:
        tasks = [asyncio.create_task(fetch(session, url)) for _ in range n]
        await asyncio.gather(*tasks)

asyncio.run(main(10))

When I make 10 concurrent requests, the first 4-5 requests are made concurrently and then it gets hung for over 10 seconds and then starts the remaining tasks which get hung again after running 2-3 concurrent requests. If I make 100 concurrent requests, it makes about 25-30 concurrent requests and gets hung and then makes 5-6 requests and gets hung again, it does this until all tasks are complete.

It's taking over two minutes to make 100 requests to https://httpstat.us/200 with aiohttp.

If I don't use a persistent ClientSession and create new ClientSession for every request, then all hundred requests finish within 5 seconds without getting hung.

I am not sure what I am doing here. Any help will be highly appreciated.

I am running Python 3.7.2 and aiohttp 3.5.4

socketpair commented 5 years ago

First, Task is not required. Second, I have tried and everything works as a charm without any hangs.

import aiohttp
import asyncio

async def fetch(session):
    async with session.get("https://httpstat.us/200") as response:
        print(await response.text())

async def main(n):
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(*(fetch(session) for _ in range(n)))

asyncio.run(main(100))

I've checked exactly your example and everything is the same: no hangs. Please check your firewall/gateway

socketpair commented 5 years ago

Feel free to reopen issue. I'm closing now since can not be reproduced.

raees-khan commented 5 years ago

Feel free to reopen issue. I'm closing now since can not be reproduced.

@socketpair I've ruled out the possibility of firewall/gateway issues. Moreover, I don't see any hangs if I create a new ClientSession for every request.

With this code, I never see concurrent requests. That is, after 20-30 requests, the program just hangs.

import aiohttp
import asyncio

async def fetch(session):
    async with session.get("https://httpstat.us/200") as response:
        print(await response.text())

async def main(n):
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(*(fetch(session) for _ in range(n)))

asyncio.run(main(100))

Following works like charm. No hangs!!

import aiohttp
import asyncio

async def fetch(session):
    async with aiohttp.ClientSession() as session:
        async with session.get("https://httpstat.us/200") as response:
        print(await response.text())

async def main(n):
    await asyncio.gather(*(fetch() for _ in range(n)))

asyncio.run(main(100))

I am also seeing a frequent SSL errors which I believe is a known bug with aiohttp 3.5.4 on Python 3.7.x. However, I don't think the code hangs because of the SSL errors.

SSL error in data received
handle_traceback: Handle created at (most recent call last):
  File "concurrent_test.py", line 406, in <module>
    asyncio.run(main(), debug=True)
  File "/anaconda3/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/anaconda3/lib/python3.7/asyncio/base_events.py", line 560, in run_until_complete
    self.run_forever()
  File "/anaconda3/lib/python3.7/asyncio/base_events.py", line 528, in run_forever
    self._run_once()
  File "/anaconda3/lib/python3.7/asyncio/base_events.py", line 1756, in _run_once
    handle._run()
  File "/anaconda3/lib/python3.7/asyncio/events.py", line 88, in _run
    self._context.run(self._callback, *self._args)
  File "/anaconda3/lib/python3.7/asyncio/selector_events.py", line 716, in _add_reader
    self._loop._add_reader(fd, callback, *args)
  File "/anaconda3/lib/python3.7/asyncio/selector_events.py", line 260, in _add_reader
    handle = events.Handle(callback, args, self, None)
protocol: <asyncio.sslproto.SSLProtocol object at 0x1080d1e10>
transport: <_SelectorSocketTransport fd=12 read=polling write=<idle, bufsize=0>>
Traceback (most recent call last):
  File "/anaconda3/lib/python3.7/asyncio/sslproto.py", line 526, in data_received
    ssldata, appdata = self._sslpipe.feed_ssldata(data)
  File "/anaconda3/lib/python3.7/asyncio/sslproto.py", line 207, in feed_ssldata
    self._sslobj.unwrap()
  File "/anaconda3/lib/python3.7/ssl.py", line 767, in unwrap
    return self._sslobj.shutdown()
ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2592)

When I use a persistent session to make concurrent requests, it sometimes hangs and completes after a very long time and sometimes it fails with following traceback. I am not sure why asyncio.TimeoutError is being raised.

Traceback (most recent call last):
  File "concurrent_test.py", line 406, in <module>
    asyncio.run(main(), debug=True)        
  File "/anaconda3/lib/python3.7/asyncio/runners.py", line 43, in run
    return loop.run_until_complete(main)
  File "/anaconda3/lib/python3.7/asyncio/base_events.py", line 573, in run_until_complete
    return future.result()
  File "concurrent_test.py", line 386, in main
    await asyncio.gather(*tasks)
  File "concurrent_test.py", line 248, in fetch
    async with session.post(url, **post_data) as response:
  File "/anaconda3/lib/python3.7/site-packages/aiohttp/client.py", line 1005, in __aenter__
    self._resp = await self._coro
  File "/anaconda3/lib/python3.7/site-packages/aiohttp/client.py", line 575, in _request
    break
  File "/anaconda3/lib/python3.7/site-packages/aiohttp/helpers.py", line 585, in __exit__
    raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError
renanvieira commented 5 years ago

@socketpair looks like this SSL error is a bug that affects aiohttp/asyncio on python3.7. No fix yet.

More info: #3535

yunstanford commented 5 years ago

Observing same hang issue

MDziwny commented 4 years ago

I have the same problem, even when opening a new ClientSession:

ssl.SSLError: [SSL: KRB5_S_INIT] application data after close notify (_ssl.c:2592)

If needed, I can provide more information.

ashkan-leo commented 4 years ago

Observing the same issue

sadikekin commented 4 years ago

Having the same issue.

sertraline commented 4 years ago

Same issue.

Using one persistent session (ssl=False) for entire script: aiohttp makes 10 concurrent connections, then hangs, then makes random quantity of concurrent requests. Notice the timing:

Image ![Screenshot_20200528_215117](https://user-images.githubusercontent.com/36751007/83182314-b158e680-a115-11ea-89b3-1115885e348a.png)

26 requests complete in about 5 seconds. Now I do what @raees-khan suggested, creating new session for every request:

Image ![Screenshot_20200528_220411](https://user-images.githubusercontent.com/36751007/83182544-101e6000-a116-11ea-8190-637766b41f8d.png)

Script finishes in less than a second (4,60s vs 0.60s).

dassencio commented 4 years ago

The main problem with creating a single ClientSession for each GET request (as suggested by @raees-khan) is that you lose the ability to control the maximum number of simultaneous connections created by aiohttp.

In my case (Python 3.8.2), a single GET request to https://diego.assencio.com already caused aiohttp to get stuck (HTTPS requests to other URLs worked fine, though). By simply creating a ClientSession object using the default parameters, the requests started succeeding, but a lot slower than usual. To clarify, I changed this:

    async with ClientSession(connector=TCPConnector(limit=16)) as session:
        ...

into this:

    async with ClientSession() as session:
        ...    

and the result was "requests succeeding but slowly", compared to "not a single request succeeding".

Interestingly, the idea from @raees-khan produced the exact same outcome for me: requests started succeeding, but just as slowly as with a single instance of ClientSession created using the default parameters.

For the record, HTTP requests are working fine. All issues I observed happen with HTTPS only.

samip5 commented 3 years ago

I'm also hitting this with my matrix bot, it will work for a long time and then crash with:

Traceback (most recent call last):
  File "main.py", line 98, in <module>
    asyncio.get_event_loop().run_until_complete(main())
  File "/usr/lib/python3.7/asyncio/base_events.py", line 584, in run_until_complete
    return future.result()
  File "main.py", line 86, in main
    await client.sync_forever(full_state=True)
  File "/home/sm/Github/dankbot/venv/lib/python3.7/site-packages/nio/client/async_client.py", line 1147, in sync_forever
    await self.run_response_callbacks([await response])
  File "/usr/lib/python3.7/asyncio/tasks.py", line 533, in _wait_for_one
    return f.result()  # May raise f.exception().
  File "/home/sm/Github/dankbot/venv/lib/python3.7/site-packages/nio/client/async_client.py", line 1003, in sync
    timeout,
  File "/home/sm/Github/dankbot/venv/lib/python3.7/site-packages/nio/client/async_client.py", line 705, in _send
    method, path, data, headers, trace_context, timeout,
  File "/home/sm/Github/dankbot/venv/lib/python3.7/site-packages/nio/client/async_client.py", line 252, in wrapper
    return await func(self, *args, **kwargs)
  File "/home/sm/Github/dankbot/venv/lib/python3.7/site-packages/nio/client/async_client.py", line 781, in send
    else timeout,
  File "/home/sm/Github/dankbot/venv/lib/python3.7/site-packages/aiohttp/client.py", line 544, in _request
    await resp.start(conn)
  File "/home/sm/Github/dankbot/venv/lib/python3.7/site-packages/aiohttp/client_reqrep.py", line 905, in start
    self._continue = None
  File "/home/sm/Github/dankbot/venv/lib/python3.7/site-packages/aiohttp/helpers.py", line 656, in __exit__
    raise asyncio.TimeoutError from None
concurrent.futures._base.TimeoutError

Seperate clientSession per request:

    async def _lab(self):
        try:
            topic = self.args[0]
            if topic == "light":
                msg = await self._lab_light()
                await send_text_to_room(self.client, self.room.room_id, msg)
            elif topic == "wifi":
                msg = await self._lab_wifi()
                await send_text_to_room(self.client, self.room.room_id, msg)
            else:
                first_msg = "You did not specify which information you want, so I'm gonna guess it as both."
                await send_text_to_room(self.client, self.room.room_id, first_msg)
                light_msg = await self._lab_light()
                await send_text_to_room(self.client, self.room.room_id, light_msg)
                wifi_msg = await self._lab_wifi()
                await send_text_to_room(self.client, self.room.room_id, wifi_msg)
        except IndexError:
            pass

    async def _lab_light(self):
        async with aiohttp.ClientSession() as session:
            async with session.get("http://localhost/pi_api/gpio/?a=readPin&pin=1") as response:
                api_response = await response.json()
                api_data = api_response['data']
                if api_data == 0:
                    msg = "Someone is probably at the lab as the light is on."
                else:
                    msg = "Nobody is at the lab, the light is currently off."
                return msg

    async def _lab_wifi(self):
        async with aiohttp.ClientSession() as session:
            async with session.get("http://localhost/visitors/api/v1/visitors.php?format=json") as response:
                api_response = await response.json()
                if not api_response:
                    msg = "Nobody is using the Wifi that I know of currently."
                else:
                    msg = "Someone or multiple someones are using the Wifi at the lab currently."
                return msg
Dreamsorcerer commented 3 years ago

I can't see any relation between the code you've posted and that traceback.

The traceback suggests that the timeout occurred within the nio library. I'd assume if you are dealing with a federated service, then timeouts could be pretty common, so not sure how that library is supposed to deal with this. It appears there is a config option to change the timeout parameter though: https://github.com/poljar/matrix-nio/blob/master/nio/client/async_client.py#L300

kevinl1210 commented 3 years ago

I have the same issue. I am making 10 concurrent requests with asyncio.gather(), the first request is successful but the rest are hanging and timed out. I running the script on WSL ubuntu 18.04.2, python 3.8.0, aiohttp 3.7.4.

However, if I run the same script on the local windows 10 machine, it is working fine.

evgenybf commented 2 years ago

I've got the same issue on Ubuntu 18.04. It's not safe to use ClientSession from several coroutines asyncroneously (gather()-ing them or from Tasks). For examle, concurent coroutines that scrape different sites may start receiving other "co-worker" responses. It happens more often when majoritiy of the sites do redirect and close connection or just respond with a delay. It can be resolved by creating a separate session instance for each worker. I guess, that libraries that utilize aiohttp client (e.g. gcloud-aio-pubsub - SubscriberClient class) may saffer from the same issue.

parafoxia commented 2 years ago

So I've been having these issues for ages, and it seems, at least in my exact use-case, to only affect WSL. I've tried to trace the cause of the issue as far as I can, but I can't seem to work out if it goes any further. This is for 3.8.1 -- I would try with 4.0.0a but it physically will not install no matter what I do. This is from a session.request call:

https://github.com/aio-libs/aiohttp/blob/v3.8.1/aiohttp/client.py#L353 https://github.com/aio-libs/aiohttp/blob/v3.8.1/aiohttp/client.py#L535 https://github.com/aio-libs/aiohttp/blob/v3.8.1/aiohttp/connector.py#L542 https://github.com/aio-libs/aiohttp/blob/v3.8.1/aiohttp/connector.py#L907 https://github.com/aio-libs/aiohttp/blob/v3.8.1/aiohttp/connector.py#L1175 https://github.com/aio-libs/aiohttp/blob/v3.8.1/aiohttp/connector.py#L986

That's as far as I can trace before VSCode and GitHub can only find stuff in pyi or test files respectively. If anyone can do a better job at tracing that, let me know.

I'm kinda hoping my time doing this is wasted and it'll be fixed in v4.0.0a anyways, but just leaving this here for further investigation.

ughstudios commented 2 years ago

This is also happening for me in 2022

datdinhquoc commented 1 year ago

Me too, the same on Ubuntu WSL (desktop) & Centos 7 (aws) when using aiohttp to make web server, even print (first line in request handler) hangs for a while after 5 times browser make requests to a handler which has a call to another server and that call is timed-out. Other handlers which have no network connections (no aiohttp.ClientSession to another server) are still fine.

Tried another client httpx but it's the same after 6 timeouts, something not bcoz of aiohttp.

datdinhquoc commented 1 year ago

Found out my case, it's browser's 6 pending reqs / server limit, not bcoz of aiohttp

Dreamsorcerer commented 2 months ago

The original reproducer doesn't seem to have a problem when I run it. If this is still an issue, can anyone provide a reproducer (or test in CI if it affects a specific environment)?

geo-martino commented 2 months ago

Experiencing this exact same issue, but strangely, it only happens when I am connected to the Wireguard VPN network I have set up. I ran some tests to confirm and this behaviour only happens with the aiohttp package and not with any other concurrent requests execution I have tested (i.e. using curl in bash).

from datetime import datetime
import asyncio
import aiohttp

url = "https://official-joke-api.appspot.com/jokes/programming/random"

async def request(session, i=0):
  print(str(i).zfill(2), "request start ", datetime.now())
  async with session.get(url) as response:
    data = await response.text()
  print(str(i).zfill(2), "request return", datetime.now())
  return data

async def main():
  async with aiohttp.ClientSession(connector=aiohttp.TCPConnector(limit=100)) as session:
    payloads = await asyncio.gather(*[request(session, i=i) for i in range(15)])
  return payloads

results = asyncio.run(main())
for result in results:
  print(result)

When I run the above script whilst Wireguard is up, I notice that half of the requests return responses immediately. However, every other request always hangs for up to 2-3 minutes. Even more strangely, it always seems to be every 2nd request which hangs. Example output showing this behaviour where every odd numbered request hangs is below.

00 request start  2024-08-26 18:49:54.270264
01 request start  2024-08-26 18:49:54.278552
02 request start  2024-08-26 18:49:54.278781
03 request start  2024-08-26 18:49:54.278929
04 request start  2024-08-26 18:49:54.279057
05 request start  2024-08-26 18:49:54.279333
06 request start  2024-08-26 18:49:54.279469
07 request start  2024-08-26 18:49:54.279591
08 request start  2024-08-26 18:49:54.279757
09 request start  2024-08-26 18:49:54.279869
10 request start  2024-08-26 18:49:54.279978
11 request start  2024-08-26 18:49:54.280103
12 request start  2024-08-26 18:49:54.280223
13 request start  2024-08-26 18:49:54.280335
14 request start  2024-08-26 18:49:54.280440
00 request return 2024-08-26 18:49:54.759689
02 request return 2024-08-26 18:49:54.828701
04 request return 2024-08-26 18:49:55.139969
08 request return 2024-08-26 18:49:55.141088
06 request return 2024-08-26 18:49:55.142415
10 request return 2024-08-26 18:49:55.764633
12 request return 2024-08-26 18:49:56.149386
14 request return 2024-08-26 18:49:56.161689
09 request return 2024-08-26 18:52:08.336999
07 request return 2024-08-26 18:52:08.464771
05 request return 2024-08-26 18:52:08.719724
13 request return 2024-08-26 18:52:08.721454
11 request return 2024-08-26 18:52:08.803287
03 request return 2024-08-26 18:52:09.359214
01 request return 2024-08-26 18:52:09.420899

As with raees-khan, if I introduce a TCPConnector with limit=1 to the session, the requests go through just fine, albeit sequentially.

To eliminate Wireguard as being the issue, I ran the following bash script for concurrent requests and found they executed concurrently without issue.

#!/bin/bash

send_request() {
  echo $i 'request start ' $(date)
  curl -s 'https://official-joke-api.appspot.com/jokes/programming/random' 1>/dev/null
  echo $i 'request return' $(date)
}

for i in $(seq 1 15); do 
  send_request &
done

For context if it helps, here's the Wireguard config on the machine I am experiencing the issue on and the connected upstream server's config.

[Interface]
PrivateKey = ***
Address = 10.2.0.12/32
DNS = 10.2.0.2
MTU = 1420

# ping public IP
PreUp = ip route add $(curl -s ifconfig.me) via $(ip route | grep default | awk '{print $3}') dev $(ip route show default | awk '{ print $5; exit }')
PostDown = ip route del $(curl -s ifconfig.me) via $(ip route | grep default | awk '{print $3}') dev $(ip route show default | awk '{ print $5; exit }')

# route public SSH traffic around VPN
PreUp = ip rule add table 128 from $(ip addr show $(ip route | awk '/default/ { print $5 }') | grep "inet" | head -n 1 | awk '/inet/ {print $2}' | cut -d'/' -f1)
PreUp = ip route add table 128 to $(ip -o -f inet addr show | awk '/scope global/ {print $4}' | head -n1 | sed -r 's:([0-9]\.)[0-9]{1,3}/:\10/:g') dev $(ip route show default | awk '{ print $5; exit }')
PreUp = ip route add table 128 default via $(ip route | grep default | awk '{print $3}')
PostDown = ip route del table 128 default via $(ip route | grep default | awk '{print $3}')
PostDown = ip route del table 128 to $(ip -o -f inet addr show | awk '/scope global/ {print $4}' | head -n1 | sed -r 's:([0-9]\.)[0-9]{1,3}/:\10/:g') dev $(ip route show default | awk '{ print $5; exit }')
PostDown = ip rule del table 128 from $(ip addr show $(ip route | awk '/default/ { print $5 }') | grep "inet" | head -n 1 | awk '/inet/ {print $2}' | cut -d'/' -f1)

[Peer]
PublicKey = ***
PresharedKey = ***
Endpoint = ***
AllowedIPs = 0.0.0.0/1, 128.0.0.0/3, 160.0.0.0/5, 168.0.0.0/6, 172.0.0.0/12, 172.32.0.0/11, 172.64.0.0/10, 172.128.0.0/9, 173.0.0.0/8, 174.0.0.0/7, 176.0.0.0/4, 192.0.0.0/9, 192.128.0.0/11, 192.160.0.0/13, 192.169.0.0/16, 192.170.0.0/15, 192.172.0.0/14, 192.176.0.0/12, 192.192.0.0/10, 193.0.0.0/8, 194.0.0.0/7, 196.0.0.0/6, 200.0.0.0/5, 208.0.0.0/4, 224.0.0.0/3, ::/0
PersistentKeepAlive = 25

Upstream server config:

[Interface]
PrivateKey = ***
Address = 10.2.0.2/24
DNS = 127.0.0.1
ListenPort = ***
MTU = 1400

PreUp = ip route add $(curl -s ifconfig.me) via $(ip route | grep default | awk '{print $3}') dev $(ip route show default | awk '{ print $5; exit }')
PostDown = ip route del $(curl -s ifconfig.me) via $(ip route | grep default | awk '{print $3}') dev $(ip route show default | awk '{ print $5; exit }')

PostUp = ufw allow Wireguard comment 'Managed by Wireguard'
PostUp = ufw allow from *** to any app SSH comment 'Managed by Wireguard'
PostUp = ufw allow from *** to any app DNS comment 'Managed by Wireguard'
PostUp = ufw route allow in on %i out on %i to any comment 'Managed by WireGuard'
PostDown = ufw delete allow Wireguard
PostDown = ufw delete allow from *** to any app SSH
PostDown = ufw delete allow from *** to any app DNS
PostDown = ufw route delete allow in on %i out on %i to any

PostUp = ip rule add from *** lookup main
PostUp = iptables -w -t nat -A POSTROUTING -o %i -j MASQUERADE
PostUp = ip6tables -w -t nat -A POSTROUTING -o %i -j MASQUERADE
PostDown = ip rule delete from *** lookup main
PostDown = iptables -w -t nat -D POSTROUTING -o %i -j MASQUERADE
PostDown = ip6tables -w -t nat -D POSTROUTING -o %i -j MASQUERADE

[Peer]
PublicKey = ***
AllowedIPs = 0.0.0.0/0, ::/0
Endpoint = ***

[Peer]
PublicKey = ***
PresharedKey = ***
AllowedIPs = 10.2.0.12/32

Given the results of the bash script, I believe the cause of the issue lies with aiohttp. However, I am honestly not sure if it's a package issue or if it's an issue with Wireguard so apologies in advance if I've got it wrong.

Dreamsorcerer commented 2 months ago

OK, first, curl is a terrible comparison as it's not using a connection pool or anything. If you move the ClientSession into request(), so you have a new session for every request, then you'll have a fairer comparison with curl. I suspect the problem disappears when you do that. So, if you can confirm that, we can ignore curl going forward.

If you can also check with a single ClientSession, but with TCPConnector(force_close=True), that should tell us whether it's an issue with keep-alive connections (which I expect it is).

Then, with the original version, I'd like to try and see which connections are being used, so try and add a print statement in:

  async with session.get(url) as response:
    print(i, getattr(response._connection, "_protocol", None))

Note that when the response is small, you'll get a None because the connection is already released. Ideally, try and reproduce on a URL that returns larger responses so you don't have any Nones.

I can only imagine that something has gone wrong in the wg connection or the server, causing it to not send any response until a timeout occurs or something. I can't imagine that aiohttp is failing to send the request or respond to the data received for a long time (we'd throw a TimeoutError or similar exception if that were the case).

I'll also try wg tomorrow and see if I can reproduce anything.

Just to confirm, you're using 3.10.5?

geo-martino commented 2 months ago

Thanks for the quick response.

However, I have to confess I checked again and I was using 3.9.5 previously. I upgraded to 3.10.5 and ran the test again and found the issue had vanished. I can't say exactly what the issue was, but I'm glad it appears to have been resolved already!

For posterity, I've added my analysis using 3.9.5 from your suggestions below anyhow in case it may still be of use.

Thank you again for your help!


Dreamsorcerer commented 2 months ago

Yeah, we fixed a number of things in recent releases, so it may have been caused by something in the parser messing up the HTTP messages in subtle ways on rare cases. So, I think we'll consider this fixed then.

thehesiod commented 2 months ago

@geo-martino if you have time it would be interesting to find which release fixed it by doing a binary search over the releases

Dreamsorcerer commented 2 months ago

If using the Python parser, then #8722 would be a likely candidate. Could result in the connection appearing to hang.

geo-martino commented 2 months ago

@thehesiod Just checked using the same script again and it appears to have been fixed in 3.10.0 as I don't see the same issue in any version >= 3.10.0