django / asgiref

ASGI specification and utilities
https://asgi.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
1.47k stars 209 forks source link

Regression in v3.8.0 #452

Closed carltongibson closed 7 months ago

carltongibson commented 7 months ago

The recent 3.8.0 release causes an intermittent test failure on Django's test suite.

https://djangoci.com/job/pull-requests-focal/database=spatialite,label=focal-pr,python=python3.11/24499/testReport/junit/asgi.tests/ASGITest/test_disconnect_with_body/

tests % ./runtests.py asgi.tests.ASGITest.test_disconnect_with_body
Testing against Django installed in '/Users/carlton/Projects/Django/django/django' with up to 10 processes
Found 1 test(s).
System check identified no issues (0 silenced).
F
======================================================================
FAIL: test_disconnect_with_body (asgi.tests.ASGITest.test_disconnect_with_body)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/Users/carlton/Projects/Django/asgiref/asgiref/sync.py", line 254, in __call__
    return call_result.result()
           ^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 449, in result
    return self.__get_result()
           ^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/concurrent/futures/_base.py", line 401, in __get_result
    raise self._exception
  File ".../asgiref/sync.py", line 331, in main_wrap
    result = await self.awaitable(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File ".../django/tests/asgi/tests.py", line 374, in test_disconnect_with_body
    with self.assertRaises(asyncio.TimeoutError):
AssertionError: TimeoutError not raised

----------------------------------------------------------------------
Ran 1 test in 0.005s

FAILED (failures=1)

This tests disconnect handling for a request with a body. The main task processing the request is not correctly cancelled, on some runs only. (The assertion fails, with response content {'type': 'http.response.start', 'status': 200, 'headers': [(b'Content-Type', b'text/html; charset=utf-8'), (b'Content-Length', b'12')]} delivered, rather than the timeout.)

Bisected to 0503c2c4df06dbe56f04689fe537325f28dbbe8b by @ttys0dev for #435.

0503c2c4df06dbe56f04689fe537325f28dbbe8b is the first bad commit
commit 0503c2c4df06dbe56f04689fe537325f28dbbe8b
Author: ttys0dev <126845556+ttys0dev@users.noreply.github.com>
Date:   Fri Jan 26 23:13:55 2024 -0700

    Fix task cancellation propagation to subtasks when using sync middleware (#435)

 asgiref/sync.py    |  59 +++++++++++++++-----
 tests/test_sync.py | 159 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 201 insertions(+), 17 deletions(-)

//cc @felixxm

ttys0dev commented 7 months ago

I think this may be a buggy test: Sending a http.disconnect AFAIU is supposed to trigger a task cancellation exception not a asyncio.TimeoutError.

carltongibson commented 7 months ago

The task is cancelled. It's the communicator that times out waiting for the response.

ttys0dev commented 7 months ago

It's the communicator that times out waiting for the response.

Hmm, but the communicator only waits if the task is not already done right? And the http.disconnect is already called before await communicator.receive_output() so wouldn't it be expected to not hit the codepath which would raise a asyncio.TimeoutError since there would already be an asyncio.CancelledError?

carltongibson commented 7 months ago

ASGIHandler runs two parallel tasks. One to process the request, the other to listen for the http.disconnect event.

In the test here, on receiving the http.disconnect that task completes, and the other task, processing the request should be cancelled.

When the test fails — and it's intermittent so the behaviour here isn't deterministic — the main request processing task isn't cancelled, which can be checked by inspecting the output received by the application communicator.

That's a regression. We need to work out why it's happened, then we can resolve it. In the meantime though, unless we can solve it quickly, I think we need to revert the problem patch and issue a 3.8.1 to avoid a breakage in deployed applications.

ttys0dev commented 7 months ago

That's a regression. We need to work out why it's happened, then we can resolve it. In the meantime though, unless we can solve it quickly, I think we need to revert the problem patch and issue a 3.8.1 to avoid a breakage in deployed applications.

Debugging it now.

carltongibson commented 7 months ago

@ttys0dev Thanks for taking a look. 🎁

ttys0dev commented 7 months ago

When the test fails — and it's intermittent so the behaviour here isn't deterministic — the main request processing task isn't cancelled, which can be checked by inspecting the output received by the application communicator.

Yeah, so it looks like the issue was that we were effectively inhibiting normal parent task asyncio.CancelledError propagation if the parent task was already done by the time we finished cancelling child tasks, #455 should fix that by checking if the task is done and propagating the asyncio.CancelledError if it was.

carltongibson commented 7 months ago

@ttys0dev great thanks. I will give it a run in the morning.

Good hustle!

carltongibson commented 7 months ago

https://pypi.org/project/asgiref/3.8.1/