Open zzlpeter opened 4 years ago
i had the same ConnectionResetError
Can you please check out the latest master and test with that? Note that as of #891 the client has the same API as redis-py.
getting a few thousand of these a day when using Django Channels tested with Python 3.8.1 and Python 3.9.2 using aioredis 1.3.1 installed as child of channels
@djstein Are you getting this on production? If you're experiencing this in development, please refer to #930 for migration needs as v1 major version will not get any fixes.
This error still exists in aioredis==2.0.0
in production.
i had the same ConnectionResetError
from my test, when this occured, aioredis / redis-py / asyncio-redis cann't connect, but aredis can do, what is the difference in it?
Hi all, the problem lies with the ConnectionError
type. aioredis
implements its own ConnectionError(builtins.ConnectionError, RedisError
which causes problems because ConnectionResetError
is no longer a subclass of ConnectionError
. If one looks at line 37 of client.py
, it overwrites ConnectionError
so that the exception can no longer be caught on line 1067
....
Any workaround found for this issue ?
Same issue here in production. There's a firewall resetting connections after some time, and aioredis won't recover from this, which is a serious problem. Is there any known workaround?
Please, consider this issue in #1225.
This issue can be closed, it should've been solved with #1129.
No, sorry, it doesn't solve this issue. Quick test:
$ docker run --rm -d -p 6379:6379 --name redis redis
test.py
program that continually reads something from Redis. For example (ugly, but easy to translate to redis
):from aioredis import Redis
import asyncio, time
redis = Redis()
loop = asyncio.get_event_loop()
loop.run_until_complete(redis.set("something", "something"))
while True:
print(loop.run_until_complete(redis.get("something")))
time.sleep(5)
$ sudo ss -K dst [::1] dport = redis
The equivalent test program using redis
instead of aioredis
automagically recovers from the loss of connection and keeps going without error.
aioredis
2.0.0 gives:
Traceback (most recent call last):
File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 815, in send_packed_command
await asyncio.wait_for(
File "/usr/lib64/python3.9/asyncio/tasks.py", line 442, in wait_for
return await fut
File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 797, in _send_packed_command
await self._writer.drain()
File "/usr/lib64/python3.9/asyncio/streams.py", line 387, in drain
await self._protocol._drain_helper()
File "/usr/lib64/python3.9/asyncio/streams.py", line 190, in _drain_helper
raise ConnectionResetError('Connection lost')
ConnectionResetError: Connection lost
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "***/test.py", line 9, in <module>
print(loop.run_until_complete(redis.get("something")))
File "/usr/lib64/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/home/***/.local/lib/python3.9/site-packages/aioredis/client.py", line 1063, in execute_command
await conn.send_command(*args)
File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 840, in send_command
await self.send_packed_command(
File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 829, in send_packed_command
raise ConnectionError(
aioredis.exceptions.ConnectionError: Error UNKNOWN while writing to socket. Connection lost.
aioredis
from master, i.e. with #1129:Traceback (most recent call last):
File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 762, in disconnect
await self._writer.wait_closed()
File "/usr/lib64/python3.9/asyncio/streams.py", line 359, in wait_closed
await self._protocol._get_close_waiter(self)
File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 815, in send_packed_command
await asyncio.wait_for(
File "/usr/lib64/python3.9/asyncio/tasks.py", line 442, in wait_for
return await fut
File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 797, in _send_packed_command
await self._writer.drain()
File "/usr/lib64/python3.9/asyncio/streams.py", line 375, in drain
raise exc
File "/usr/lib64/python3.9/asyncio/selector_events.py", line 856, in _read_ready__data_received
data = self._sock.recv(self.max_size)
ConnectionAbortedError: [Errno 103] Software caused connection abort
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "***/test.py", line 9, in <module>
print(loop.run_until_complete(redis.get("something")))
File "/usr/lib64/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/home/***/.local/lib/python3.9/site-packages/aioredis/client.py", line 1063, in execute_command
await conn.send_command(*args)
File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 840, in send_command
await self.send_packed_command(
File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 829, in send_packed_command
raise ConnectionError(
aioredis.exceptions.ConnectionError: Error 103 while writing to socket. Software caused connection abort.
So we switched error messages, but the issue persists.
A workaround would be to call redis.connection_pool.disconnect()
before performing any operation before/after a long pause where a reset may happen.
@Enchufa2 thanks for running the test. What do you mean automagically? Is there code that they implement that we somehow missed or code that we did port that is not functioning properly? Does internal connection not handle this properly?
I mean that redis
somehow figures out that the connection is broken, disposes of it and opens a new one instead of failing. I'm not sure how redis
does this and therefore what's missing here, because unfortunately I'm not familiar with either codebases. But redis
's behaviour is what I would expect from a connection pool. Otherwise, one needs to think about whether there are connections and whether they are alive, which is contrary to the very abstraction of a connection pool, right?
@Enchufa2 can you change this:
to if not self.is_connected:
Redis is checking using self._sock
, but we don't use self._sock
. This could be the underlying reason, though I'm not sure how we didn't catch this early on or if a PR changed this somehow.
Nope, this doesn't help, same error. By the way, I noticed that I switched the errors coming from the current release and the current master branch in my previous comment. Apologies, the comment is amended now.
This is interesting. It turns out I had redis
v3.5.3 installed, and this is the version that recovers from connection aborts and resets. I just updated to v4.0.2 and it shows this same issue.
Reported in https://github.com/redis/redis-py/issues/1772
@Enchufa2 I'm unable to reproduce this error in both redis main/master branch and aioredis==2.0.0. https://github.com/Andrew-Chen-Wang/aioredis-issue-778 unlike ss
, I'm using CLIENT KILL
and CLIENT KILL ADDR
with no success. When doing CLIENT LIST
, in both cases, a new connection is established, and this is shown via an incremented ID. I was using gitpod since I don't have a tool similar to ss
on Mac for killing sockets. There may be a chance CLIENT KILL
gave aioredis a warning beforehand, but if a Connection reset by peer
is occurring, then I can't imagine functionality similar to CLIENT KILL
not being performed.
If I use CLIENT KILL
, I see this with the current master branch:
Traceback (most recent call last):
File "***/test.py", line 9, in <module>
print(loop.run_until_complete(redis.get("something")))
File "/usr/lib64/python3.9/asyncio/base_events.py", line 642, in run_until_complete
return future.result()
File "/home/***/.local/lib/python3.9/site-packages/aioredis/client.py", line 1064, in execute_command
return await self.parse_response(conn, command_name, **options)
File "/home/***/.local/lib/python3.9/site-packages/aioredis/client.py", line 1080, in parse_response
response = await connection.read_response()
File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 854, in read_response
response = await self._parser.read_response()
File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 367, in read_response
raw = await self._buffer.readline()
File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 301, in readline
await self._read_from_socket()
File "/home/***/.local/lib/python3.9/site-packages/aioredis/connection.py", line 250, in _read_from_socket
raise ConnectionError(SERVER_CLOSED_CONNECTION_ERROR)
aioredis.exceptions.ConnectionError: Connection closed by server.
We were also seeing this in production with aioredis==2.0.0
. Our solution was to remove the connection pool and rely on single connections (although this isn't ideal).
@Andrew-Chen-Wang I see you were unable to reproduce, is there any current plan to look into this further? Or would you like someone to put together a PR with a potential fix?
@cjdsellers I still can't reproduce this issue, so a PR is much appreciated as we're down a maintainer now and I'm stuck with work.
I totally understand regarding the bandwidth. Maybe @Enchufa2 could chime in here with a solution strategy, as he seems to be very familiar with the issue?
@cjdsellers My bet is that a possible solution would be to catch OSError
in https://github.com/aio-libs/aioredis-py/blob/224f843bd4b33d657770bded6f86ce33b881257c/aioredis/connection.py#L1423 but I haven't had the time to test this yet. I'm also stuck with work :(, and the workaround in https://github.com/aio-libs/aioredis-py/issues/778#issuecomment-984654896, although not ideal, works well.
OK, I could put together a PR for you guys to look at. I'll see what I can do over this festive period.
@Enchufa2 In the mean time I did try the workaround as you suggested in your comment, however maybe my implementation wasn't correct because the issue remained. If you're successfully using this work around, then some sort of code snippet would be much appreciated!
OK, I could put together a PR for you guys to look at. I'll see what I can do over this festive period.
There are situations in which aioredis
fails and redis
does not (for instance, after a CLIENT KILL
, see https://github.com/redis/redis-py/issues/1772). So keep in mind that aioredis
may have additional issues not present in redis
. In this regard, I would try a fix in redis
in the first place, and then port it here.
@Enchufa2 In the mean time I did try the workaround as you suggested in your comment, however maybe my implementation wasn't correct because the issue remained. If you're successfully using this work around, then some sort of code snippet would be much appreciated!
Currently I'm using aioredis
in a microservice that makes a snapshot of certain information every n minutes. The code is not public, but this is essentially what it does:
async def set_snapshot():
# ...
try:
s = snapshot.get()
await redis.set("snapshot-identifier", s)
finally:
# workaround for https://github.com/aio-libs/aioredis-py/issues/778
await redis.connection_pool.disconnect()
As I said, this fragment is currently called every n minutes, so destroying the pool for every call doesn't really have so much impact in my case. My original issue started when the infra people set up a firewall to automatically abort long-lasting connections.
So this simple fix https://github.com/redis/redis-py/pull/1832 solves the issues for redis
. That patch is required here but not sufficient, because the situation is more complex, as aioredis
did not catch up on how redis
manages command execution. redis
makes unconditional retries, e.g., here
However, aioredis
fails unless the error is a TimeoutError
here
Porting https://github.com/redis/redis-py/pull/1832 and commenting out those two lines above solves this issue. But that's not a solution, obviously. Therefore, aioredis
would need to port the Retry
class from redis
first, or a similar mechanism, and then modify all the client functions to use call_with_retry
.
Thank you for identifying the solution Enchufa2.
Yes, we're... fairly far behind, and I apologize for that. Hoping to reboot #1156 sometime soon (maybe once my new semester begins), but these ports do take quite awhile unfortunately. I'm hoping to get these ports in through a more incremental approach since #1156 stalled.
@Enchufa2 please test this on #1156. It should be up to date. I'm hoping to push out aioredis 2.1.0 as soon as #1156 is merged. Thanks for the patience everyone!
CLIENT KILL
(which is good, redis
did that) and still raises an error for connection aborts and resets (as expected, and even if redis/redis-py/pull/1832 is implemented).Same stack trace. To fix this, it is necessary to implement both retries (not only timeout retries) and 1832.
gotcha, to be clear it's Connection's retry_on_error
that resolves this right? Though I'm not understanding how it resolves it using default Connection as I think the default only catches Timeout error? Are you adding additional errors? Thanks for the help along the way @Enchufa2!
gotcha, to be clear it's Connection's
retry_on_error
that resolves this right?
No, it's call_with_retry
, from the Retry
class. The issue is two-fold:
aioredis
tries to use a connection, it retries only if there's a timeout. Otherwise, it simply fails. redis
implements call_with_retry
, which always retries, whatever happens, and then raises a disconnect
if everything goes wrong. Then, the disconnect
method tests the socket ignoring OSError
. So it doesn't fail on resets and aborts, and reconnection can happen.aioredis
or redis
ask for a new connection from the pool, if the connection is down due to a reset or an abort, OSError
is raised. This is what 1832 fixes.@Enchufa2 ok I didn't think it was that as we implemented all call_with_retry, but, as it turns out, in the port, we forgot to await the fail/disconnect call in call_with_retry
... It's now fixed in #1156. May you please test it one more time for me. Thank you so much! (I've also implemented redis/redis-py#1832 to #1156)
We're still in the same spot with those changes:
Traceback (most recent call last):
File "/home/***/.local/lib/python3.10/site-packages/aioredis/connection.py", line 802, in disconnect
await self._writer.wait_closed() # type: ignore[union-attr]
File "/usr/lib64/python3.10/asyncio/streams.py", line 344, in wait_closed
await self._protocol._get_close_waiter(self)
File "/home/***/.local/lib/python3.10/site-packages/aioredis/connection.py", line 853, in send_packed_command
await asyncio.wait_for(
File "/usr/lib64/python3.10/asyncio/tasks.py", line 408, in wait_for
return await fut
File "/home/***/.local/lib/python3.10/site-packages/aioredis/connection.py", line 835, in _send_packed_command
await self._writer.drain()
File "/usr/lib64/python3.10/asyncio/streams.py", line 360, in drain
raise exc
File "/usr/lib64/python3.10/asyncio/selector_events.py", line 856, in _read_ready__data_received
data = self._sock.recv(self.max_size)
ConnectionAbortedError: [Errno 103] Software caused connection abort
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/***/redis-issue/test-aioredis.py", line 9, in <module>
print(loop.run_until_complete(redis.get("something")))
File "/usr/lib64/python3.10/asyncio/base_events.py", line 641, in run_until_complete
return future.result()
File "/home/***/.local/lib/python3.10/site-packages/aioredis/client.py", line 1173, in execute_command
return await conn.retry.call_with_retry(
File "/home/***/.local/lib/python3.10/site-packages/aioredis/retry.py", line 55, in call_with_retry
await fail(error)
File "/home/***/.local/lib/python3.10/site-packages/aioredis/client.py", line 1162, in _disconnect_raise
raise error
File "/home/***/.local/lib/python3.10/site-packages/aioredis/retry.py", line 52, in call_with_retry
return await do()
File "/home/***/.local/lib/python3.10/site-packages/aioredis/client.py", line 1151, in _send_command_parse_response
await conn.send_command(*args)
File "/home/***/.local/lib/python3.10/site-packages/aioredis/connection.py", line 878, in send_command
await self.send_packed_command(
File "/home/***/.local/lib/python3.10/site-packages/aioredis/connection.py", line 867, in send_packed_command
raise ConnectionError(
aioredis.exceptions.ConnectionError: Error 103 while writing to socket. Software caused connection abort.
Now I'm a bit lost. How does a disconnect
involve a send_packed_command
? Or am I reading the trace incorrectly?
Take a look at what Retry is set to in your connection object: https://github.com/aio-libs/aioredis-py/blob/cb93ef4a6fa80f3d32d78aced58d8df196aa58e1/aioredis/connection.py#L629
In the stack trace, it did fail. My guess is you had retry_on_timeout implemented so you got 1 retry. Retry performed the send command function again. I guess what I wouldn't understand is whythe exception still wasn't caught. Maybe the Rrtry class also needs to catch OSError? Probably using a debugger would be helpful to catch whether the disconnected connection was being reused?
Were you able to reproduce the error? If not, I think we should focus on that first. :) You need to find a way to reset or abort a TCP connection in macOS, or otherwise, test this into a Linux box, where you can use ss
, which aborts the connection. AFAIK, there's no equivalent in macOS. But there's the tcpkill
utility, from dsniff
, which seems to work on macOS too (at least this fork).
I just tried tcpkill
myself on Linux and works nicely by resetting the connection (exactly the same issue I've found in production). There are two things you need to do for this to work. First, launch tcpkill
:
$ sudo tcpkill -i lo port 6379
Here tcpkill
keeps listening the loopback interface and will reset any TCP connection to port 6379 (if your Redis host is not local, then just change the interface accordingly). And secondly, you need to force an IPv4 connection (because tcpkill
doesn't understand IPv6). Again for localhost, you just need to specify redis = Redis(host="127.0.0.1")
in my test case above.
Seems like you guys are really close to a solution on this? Thank you for all the effort here!
Now that Aioredis is in redis-py 4.2.0rc1, is the expectation that this issue will go away once we migrate to redis
? Any idea when a non-rc version will be released? Thanks for all your hard work!
Still actual
File "aioredis/client.py", line 1082, in execute_command
conn = self.connection or await pool.get_connection(command_name, **options)
File "aioredis/connection.py", line 1422, in get_connection
if await connection.can_read():
File "aioredis/connection.py", line 893, in can_read
return await self._parser.can_read(timeout)
File "asyncio/streams.py", line 665, in read
raise self._exception
ConnectionResetError: [Errno 104] Connection reset by peer
@Enchufa2 ok I didn't think it was that as we implemented all call_with_retry, but, as it turns out, in the port, we forgot to await the fail/disconnect call in
call_with_retry
... It's now fixed in #1156. May you please test it one more time for me. Thank you so much! (I've also implemented redis/redis-py#1832 to #1156)
@Andrew-Chen-Wang I see that thoses fixes are in pull requests in aioredis, but not in redis-py asyncio. Will this be ported too ? Actually I'm stuck with aioredis 1.3.1 due to ConnectionResetError :(
i've lost a severe amt of time to work so not sure when it'll be ported over. Maybe sometime in June unless someone else would like to help out. Anyone can help!
Hi @Andrew-Chen-Wang could you point me in the direction of the changes that need to be ported over? We have currently disabled connection pooling due to the stability issues, and it definitely has introduced a performance overhead.
hi @steve-marmalade please head over to redis/redis-py to ask this question. I'm fairly busy lately, but I'll try to answer when I find the time. Sorry!
edit sorry! I shoulda looked at this more carefully
@chayim do you think you can port over the PR that fixed this issue? Thanks!
Gentle bump @Andrew-Chen-Wang , @chayim
Hi, everybody! I use torando+aioredis, and recently i met this issue, below is traceback my environ: aioredis==1.2.0 tornado==5.1.1 I use this method
aioredis.create_redis_pool(**args)
to create pool can anybody show me help? thx a lot.