aio-libs-abandoned / aioredis-py

asyncio (PEP 3156) Redis support
https://aioredis.readthedocs.io/
MIT License
2.3k stars 336 forks source link

aioredis 1.3.1 vs 2.0.0 performance #1208

Open artesby opened 2 years ago

artesby commented 2 years ago

Describe the bug

aioredis==2.0.0 performs x4-10 times slower than aioredis==1.3.1

to see this we did a simple test:

test_aioredis1.py ```python import asyncio import os import time import aioredis url = os.environ.get('REDIS_URL', 'redis://localhost:6379/13') max_conn = os.environ.get('MAX_CONNECTIONS', 64) async def task(i, redis): key = f'key:{i}' v = await redis.get(key) new_v = 1 if v is None else int(v) + 1 await redis.set(key, new_v, expire=600) async def run(n=1500): redis = await aioredis.create_redis_pool( url, encoding='utf-8', maxsize=max_conn ) tasks = [asyncio.create_task(task(i, redis)) for i in range(n)] start = time.time() await asyncio.gather(*tasks) t = time.time() - start print(f'aioredis1: {n} tasks with blocking pool with {max_conn} connections: {t}s') if __name__ == "__main__": asyncio.run(run(n=1500)) ```
test_aioredis2.py ```python import asyncio import os import time from aioredis import BlockingConnectionPool, Redis url = os.environ.get('REDIS_URL', 'redis://localhost:6379/14') max_conn = os.environ.get('MAX_CONNECTIONS', 64) async def task(i, redis): key = f'key:{i}' v = await redis.get(key) new_v = 1 if v is None else int(v.decode()) + 1 await redis.set(key, new_v, ex=600) async def run(n=1500): pool = BlockingConnectionPool.from_url( url=url, max_connections=max_conn ) redis = Redis(connection_pool=pool) tasks = [asyncio.create_task(task(i, redis)) for i in range(n)] start = time.time() await asyncio.gather(*tasks) t = time.time() - start print(f'aioredis2: {n} tasks with blocking pool with {max_conn} connections: {t}s') if __name__ == "__main__": asyncio.run(run(n=1500)) ```

To Reproduce

  1. pip3 install aioredis==2.0.0 && python3 test_aioredis2.py
  2. pip3 install aioredis==1.3.1 && python3 test_aioredis1.py

Expected behavior

aioredis v1.3.1 and v2.0.0 have same performance

Logs/tracebacks

aioredis2: 1500 tasks with blocking pool with 64 connections: 0.3346226215362549s
aioredis1: 1500 tasks with blocking pool with 64 connections: 0.0765523910522461s

Python Version

$ python --version
Python 3.7.5

aioredis Version

$ python -m pip show aioredis
Name: aioredis
Version: 1.3.1

$ python -m pip show aioredis
Name: aioredis
Version: 2.0.0

Additional context

The problem is reproducible using either local redis (in docker) or redis on remote machine. Increasing n coroutines count monotonically increases the gap between execution time on aioredis1 vs aioredis2.

We tried to investigate the problem using cProfiler & yappi profiler. Looks like the root of the problem is somewhere near interaction with SocketBuffer, but we did not find exactly where.

So any help would be appreciated

Code of Conduct

seandstewart commented 2 years ago

Hey @artesby -

Thank you for the benchmarks! I suspect the issue lies with our connections re-implementation, which is using a port of reds-py's SocketBuffer. This has the negative affect of passing all socket read/writes through two buffers rather than one.

I'll take a look at this and see if my hunch is correct.

m-novikov commented 2 years ago

@seandstewart I was also curious about this issue and did a bit of digging.

I suspect the issue lies with our connections re-implementation, which is using a port of reds-py's SocketBuffer.

I don't think this is the case, as if HiredisParser is explicitly used it doesn't use SocketBuffer at all, but performance stays in the same range as with PythonParser as in measurements above for 2.0.

I'd guess the problem lies that 1.3.1 used implicit pipelining or requests, as it didn't call drain after writing command and had a separate reader task. I did a couple of hacky implementations to get some baseline measurements I'll post them here a bit later after cleanup which seems to support my hypothesis.

m-novikov commented 2 years ago

So here is the repo with a couple of the above benchmarks and a couple of experimental clients: https://github.com/m-novikov/aioredis-benchmarks

Here is very minimalistic client covering the above use case (get, set) trying to be as performant as possible. It performs about 50% better than 1.3 version. I used callback-based asyncio.Protocol to avoid additional costs due to context switching between coroutines. Feel free to review the code and point out if I missed something. This is the most barebones version that I could come up with it and it should be a good baseline for this workload.

Also, I tried to port this protocol-based implementation to current aioredis https://github.com/aio-libs/aioredis-py/pull/1216 this is not a complete solution, the goal of this PR is to have a discussion.

seandstewart commented 2 years ago

Ah, yes I recall the logic now. This pipelining would be more efficient, but it made debugging extremely difficult.

Thanks for the example using a Protocol, I've not used this myself, but I like its straightforward approach to this problem.

I'll take a look at the PR and poke around on it.

m-novikov commented 2 years ago

Ah, yes I recall the logic now. This pipelining would be more efficient, but it made debugging extremely difficult.

I'd say it's not a problem of pipelining, but mostly the code around it, IMO as long as there is explicit state machine reading from the connection, it should significantly simplify debugging even when using pipelining.

I'll take a look at the PR and poke around on it.

Awesome thanks, I propose we move the further discussion to the PR.

m-novikov commented 2 years ago

It's actually even worse when redis is on remote machine and under high workload. Current version is up to 40 times slower compared to 1.3 branch.

simple_async: 10000 tasks with blocking pool with 64 connections: 0.36479735374450684s
aioredis1: 10000 tasks with blocking pool with 64 connections: 0.41457247734069824s

aioredis2: 10000 tasks with blocking pool with 64 connections: 16.801677227020264s

aioredis2_proto: 10000 tasks with blocking pool with *1* connection: 0.5052070617675781s