Closed bbrowning918 closed 10 months ago
Thanks for writing this up @bbrowning918! 🏅
I believe I tracked down and fixed the first issue you described. I have a PR ready in redis-py repo https://github.com/redis/redis-py/pull/2607
The biggest change from 4.3.5 to 4.4.0 was that they got rid of second buffer SocketBuffer
and started using asyncio.StreamReader
directly. Which resulted in that odd behaviour when canceling a task executing redis commands
Hi @jajce — I had a quick look at your branch in #353. The GHA workflows didn't complete.
Locally it's hanging in test_core.py
:
task: <Task pending name='Task-1037' coro=<test_message_expiry__group_send__one_channel_expires_message() done, defined at /Users/carlton/Projects/Django/channels_redis/tests/test_core.py:554> wait_for=<Future pending cb=[Task.task_wakeup()]>>
I haven't had a chance to dig more deeply yet, so I appreciate this is quite superficial — but maybe it gives you a breadcrumb… Thanks!
@carltongibson yeah, unfortunately my PR in redis-py is supposed to fix only test_receive_cancel
:(. I am still looking into what is happening with test_message_expiry__group_send__one_channel_expires_message
Ok, investigated a little bit more and will share what I've found so far. Now I am pretty much sure that both of the issues are related to the removal of SocketBuffer
https://github.com/redis/redis-py/pull/2418/ . The test_message_expiry__group_send__one_channel_expires_message
issue is very well described in this ticket https://github.com/redis/redis-py/issues/2579 .
In our test the issue is that even though we time out our receive
here:
https://github.com/django/channels_redis/blob/89b29ad300704aa09ec9e2e31efad5eec11ec03f/tests/test_core.py#L589-L591
our next receive
, which is here
https://github.com/django/channels_redis/blob/89b29ad300704aa09ec9e2e31efad5eec11ec03f/tests/test_core.py#L594
will have to wait for 5 seconds
https://github.com/django/channels_redis/blob/89b29ad300704aa09ec9e2e31efad5eec11ec03f/channels_redis/core.py#L81
https://github.com/django/channels_redis/blob/89b29ad300704aa09ec9e2e31efad5eec11ec03f/channels_redis/core.py#L422-L424
to pass till we are able to read response from our next redis command. In our case the first command we are sending to redis is our cleanup EVAL script and when we are getting to read response StreamReader
waits for 5 seconds to pass till it actually reads response for EVAL command.
So on one hand it is very different behaviour compared to redis v4.3.5 which is an issue, but on the other hand according to the Redis documentation regarding blocking operations it is an intended behaviour and redis client connection is supposed to be blocked till timeout passes or till another client writes something to the list we are trying to pop from.
I still need to understand how SocketBuffer
prevented whole connection from being blocked in 4.3.5, which hopefully will lead to a fix.
OK, I've marked the failing tests xfail for now in #356.
This allows updating redis-py
for 4.5.3, which is a security release.
The current Python 3.11 failures are pending the next redis-py release https://github.com/redis/redis-py/commit/480253037afe4c12e38a0f98cadd3019a3724254 (4.5.4 or greater.) Error in asyncio.timeout
before Python 3.11.3.
We have questions here:
redis-py
?channels_redis
? Can folks provide minimal reproduces (test cases) if so? I'll leave this issue to track.
As per https://github.com/django/channels_redis/pull/358, we're going to skip PY311 tests until the next redis-py
release.
@carltongibson , redis-py
released 4.5.4.
We can revert that change
Thanks @DmytroLitvinov. https://github.com/django/channels_redis/commit/513f85901bee6ad2543abe0c9e113e73865218a0
@carltongibson after upgrading to 4.1.0 I got troubles when testing channel layers.
I am using django-channels
(specifically the async consumers) and I notice that when I run multiple tests that are using the channel_layer
I keep getting TimeoutError
randomly. Fun fact: if I run each test one at a time no error occurs.
I suspected there was something that is not flushing properly between tests. Indeed, calling app.channel_layer.flush()
seem to solve the problem. Here's a toy test:
@pytest.mark.asyncio
@pytest.mark.django_db(transaction=True)
async def test_consumer1():
app = TestConsumer()
communicator = WebsocketCommunicator(app, "")
connected, _ = await communicator.connect()
assert connected
await app.channel_layer.flush()
I am trying to reproduce the issue with something simpler but it looks harder than expected. I struggle to understand what is going on under the hood when mixing up sync_to_async
, pytest.mark.django_db(transaction=True)
and channel layers.
redis46: redis>=4.6,<4.7
redis50: redis>=5.0,<5.1
redismain: https://github.com/redis/redis-py/archive/master.tar.gz
Any of those should work. Please open a new issue if you hit problems with one of those versions.
This issue is to help with visibility and discussion:
redis-py >=4.4.0 introduces two breaking issues for the test suite and real usage on
test_receive_cancel
in both core and sentinel will fail with "TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'" on the response to ZREMRANGEBYSCORE. In usage it has been noted in redis-py on https://github.com/redis/redis-py/issues/2593 affecting normal lookingsend
calls.test_message_expiry__group_send__one_channel_expires_message
can hang indefinitely in the test suiteThese affect both unrelated changes/PRs and 4.0.0 in actual usage. The change log for redis-py 4.4.0 does not seem to have a smoking gun and pinning it to 4.3.5 has the test suite passing better.