Open gongy opened 11 months ago
I think it's due to too many connections being requested from pool at the same time. You should increase your pool size according to your load, so the requests for new connections wouldn't queue up.
The execution blocks, because every async with pool.acquire()
block wants a connection from the pool, but when there are none left, it blocks until there is one it can take.
asyncpg
defaults to 10 max connections in pool, I think? So it's definitely too few for 10000-ish concurrent requests :)
The issue is reproducible with min_size=100, max_size=100
. Increasing the size of the pool is not a feasible workaround here.
Crucially, awaiting a new connection from the pool should not block the asyncio
event loop -- that is, cause asyncio.sleep(0.001)
to take more than 100ms. This would affect everything else running on the event loop (e.g. a production server).
Crucially, awaiting a new connection from the pool should not block the
asyncio
event loop
It doesn't. In your repro case all coroutines are blocked on the acquire()
path. Start an independent task that does not depend on a connection to verify.
Hi @elprans -- in the code sample, the task that is acquiring a connection from the pool is on an independent task from the one calling asyncio.sleep().
The time measurement in the code block is between two instants before and after asyncio.sleep(0.001), without any database operations on that task, and it is measuring over 100 ms in our reproduction.
We believe we are seeing asyncpg block the event loop and actually prevent other coroutines from making progress.
t0 = time.time() await asyncio.sleep(0.001) elapsed_ms = (time.time() - t0) * 1000 if elapsed_ms > 50: print(f">>> {i} took {elapsed_ms}ms")
^ If it helps, this is the code segment that I'm referring to. This prints increasing values up to 480 ms when running our minimal reproduction script, even though we only sleep for 1 ms. If I'm not mistaken, this coroutine is independent of the acquire()
path — does that clarify the bug report?
I'm still not sure what you are referring to. If you create thousands of tasks like this, then the event loop might indeed take a long time to get back to you after that sleep()
. But asyncpg itself does not "block" the event loop in any way. If it did, then you would get asyncio debug messages about acquire
or some other internal coroutine or callback occupying the CPU for a long time, and I don't see that here.
(this is also indirectly confirmed by the fact that disabling debug
helps, as the event loop has lower overhead then).
Thanks for the comments. We have been seeing issues where asyncpg itself seems to block the event loop even when we don't back up on many asyncio tasks at once.
I can see why this minimal reproduction would be confusing though, as it appears that it creates many tasks. I'll try to provide a better example.
you would get asyncio debug messages about
acquire
or some other internal coroutine or callback occupying the CPU for a long time, and I don't see that here.
We understand this, but is it possible that the stack trace is obscured due to the Cython boundary? Very unlikely that set_result()
on asyncio's object would block for 100 milliseconds, and a quick search indicated to us that it could be the other end of that handler that's waiting for the result, which blocked for 100 milliseconds.
Very unlikely that
set_result()
on asyncio's object would block for 100 milliseconds, and a quick search indicated to us that it could be the other end of that handler that's waiting for the result, which blocked for 100 milliseconds.
There are no busyloops anywhere in asyncpg, all waits are await
-s, so you are at the mercy of the event loop. We are talking about a single-threaded iterate-over-a-bunch-of-stuff loop. If there is a blocking pathology somewhere in the asyncpg code, PYTHONASYNCIODEBUG=1
should clearly point it out under relatively low load. Your repro does not seem to demonstrate that, as there is a variety of "slow tasks" in your debug output, including _SelectorSocketTransport._read_ready()
, which indicates an overloaded event loop and nothing else.
If there is a blocking pathology somewhere in the asyncpg code,
PYTHONASYNCIODEBUG=1
should clearly point it out under relatively low load. Your repro does not seem to demonstrate that, as there is a variety of "slow tasks" in your debug output, including_SelectorSocketTransport._read_ready()
, which indicates an overloaded event loop and nothing else.
Thanks for the help and your pointers! We tried to reproduce it again and have confirmed that as you said, if we use a semaphore or otherwise limit concurrency len(asyncio.all_tasks())
to below 500, we can't reproduce the issue in any of our asyncpg tests. The reproduction we sent was flawed, and you're right, its delayed tasks were because the event loop was slowed down.
We can still reproduce an issue we saw when using our ORM on top of asyncpg under low load, but that's not an issue with asyncpg, and we misattributed it to asyncpg when we made this issue.
Thanks so much for your help.
pip install asyncpg==0.28.0
and install from source.Reproduction (for simplicity, against a local Docker instance of Postgres).
Output:
Further investigation
Adding verbose prints to
protocol.pyx
led to me chasing down one particular 80ms+ blocking execution, which ended atwaiter.set_result(...)
in_on_result__simple_query
, which took up the majority (150ms out of 151ms, for example) of a slow callback. After this, I wasn't sure how to continue debugging -- open to suggestions or ideas here.Removing debug=True
The issue is still present, albeit less frequent, without debug mode on
Thanks all!