Closed ttys0dev closed 8 months ago
I agree this is an issue, but I do not currently have the mental space to wander into that code and fix it; the exception propagation code is incredibly carefully put together and sensitive, and there's a chance it may not be possible to implement this at all without ruining other existing behaviour. If you want to have a go, though, be my guest.
If you want to have a go, though, be my guest.
Yeah, I've been experimenting a bit, I added an exception handler which was able to catch the initial cancellation exception but it's unclear how to properly identify subtasks through a sync_to_async
and then through an async_to_sync
transition.
I do see the subtasks that the cancellation needs to propagate to when I call asyncio.all_tasks()
but I'm not sure how to go about say filtering those for only subtasks created by the function in the sync_to_async
call. Are subtasks somehow tracked using context something like that?
The only identifiers are kept as global dictionaries on the SyncToAsync and AsyncToSync classes, with a little bit of contextvars these days as well.
The only identifiers are kept as global dictionaries on the SyncToAsync and AsyncToSync classes
Is there already an identifier there that can be used to identify spawned tasks or would that need to be added?
For SyncToAsync, it's not making asyncio.Task
objects at all (if that's what you mean by tasks), it is making subthreads and/or reusing a parent thread that is already yielding via AsyncToSync.
They're identified using contextvars, but only if thread_sensitive
mode is enabled; otherwise, it always makes a new subthread, and no mapping is kept because it's intrinsic in the await loop.run_in_executor
call.
For SyncToAsync, it's not making
asyncio.Task
objects at all (if that's what you mean by tasks), it is making subthreads and/or reusing a parent thread that is already yielding via AsyncToSync.
So for reference if I run the failing test in debugger and set a breakpoint here and print asyncio.all_tasks()
from the debug console I see the following tasks:
{<Task pending name='Task-1' coro=<test_inner_shield_sync_middleware() running at /Users/ttys0dev/asgiref/tests/test_sync.py:907> cb=[_run_until_complete_cb() at /usr/local/Cellar/python@3.12/3.12.1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py:180]>,
<Task pending name='Task-3' coro=<AsyncToSync.main_wrap() running at /Users/ttys0dev/asgiref/asgiref/sync.py:321> wait_for=<Future pending cb=[shield.<locals>._outer_done_callback() at /usr/local/Cellar/python@3.12/3.12.1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py:922, Task.task_wakeup()]>>,
<Task pending name='Task-4' coro=<test_inner_shield_sync_middleware.<locals>.async_task() running at /Users/ttys0dev/asgiref/tests/test_sync.py:895> wait_for=<Future pending cb=[Task.task_wakeup()]> cb=[shield.<locals>._inner_done_callback() at /usr/local/Cellar/python@3.12/3.12.1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/tasks.py:905]>}
From that same breakpoint if I get the current task using asyncio.current_task()
I can get the following task:
<Task pending name='Task-1' coro=<test_inner_shield_sync_middleware() running at /Users/ttys0dev/asgiref/tests/test_sync.py:907> cb=[_run_until_complete_cb() at /usr/local/Cellar/python@3.12/3.12.1/Frameworks/Python.framework/Versions/3.12/lib/python3.12/asyncio/base_events.py:180]>
They're identified using contextvars, but only if thread_sensitive mode is enabled; otherwise, it always makes a new subthread, and no mapping is kept because it's intrinsic in the await loop.run_in_executor call.
So thread_sensitive is enabled for the failing test right?
I think the subtask that I need to propagate the cancellation to is showing up as Task-4
in asyncio.all_tasks()
. Is there a way to identify that Task-4
is a subtask of Task-1
via contextvars in this case?
I'm afraid I haven't looked at that code in over a year so it'll take me a couple of hours one evening to sit down and fully remember how it all works in order to correctly answer that question and review the pull request!
I think the subtask that I need to propagate the cancellation to is showing up as
Task-4
inasyncio.all_tasks()
. Is there a way to identify thatTask-4
is a subtask ofTask-1
via contextvars in this case?
By the way I think my PR effectively works by propagating the cancellation to Task-3
which then in turn cancels Task-4
which is a subtask of Task-3
.
The idea with my PR is that it uses a list to track the tasks in the call stack across sync/async boundaries and then propagates the cancellation to the last task in the list which AFAIU will be the async task currently being executed and thus the one that needs to be cancelled first. The cancellation exception or completions(if say inhibited via shield) then propagates normally back through the call stack to the top level task.
The deadlock from my understanding happens due to a failure to propagate the task cancellation to subtasks across sync_to_async
/async_to_sync
boundaries.
The idea with my PR is that it uses a list to track the tasks in the call stack across sync/async boundaries and then propagates the cancellation to the last task in the list which AFAIU will be the async task currently being executed and thus the one that needs to be cancelled first. The cancellation exception or completions(if say inhibited via shield) then propagates normally back through the call stack to the top level task.
Actually this approach I used in my original PR didn't work correctly as it would propagate task cancellation through an asyncio.shield
. I updated my PR to chain cancellation calls through the sync/async boundaries in a way that allows for asyncio.shield
to work correctly. I also added a test that should catch this bug.
Fixed in #435.
There is a deadlock bug which is demonstrated by this recently added test(added in #432) which is currently disabled. There was another recently fixed deadlock issue in #348 which I'm thinking may possibly be related/similar although I'm not entirely sure.
When I run this broken test in a debugger the
asyncio.CancelledError
is incorrectly caught by this exception block and the test then hangs/deadlocks immediately after.It's expected that the
asyncio.CancelledError
should instead be caught in this exception block which is what happens when using async middleware as demonstrated in this test which works as expected.I'm guessing this issue has something to do with how
asyncio.CancelledError
exceptions propagate betweensync_to_async
andasync_to_sync
or something along those lines.