DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
553 stars 416 forks source link

Unexpected Cancellation propagation when using sse-starlette, fastapi, uvloop, and ...gevent? #10550

Open antarus12345 opened 2 months ago

antarus12345 commented 2 months ago

Summary of problem

TL;DR:

To spare you all the travesty of seeing the giant pile of spaghetti that is my production application, I have managed a smaller reproduction case that I hope helps you.

I'd love to keep gevent installed and either have this bug fixed, or just not instrument gevent in this application; I'm not intending to use gevent at all (I'm supposed to be using asyncio / anyio in fastapi..)

Which version of dd-trace-py are you using?

ddtrace-2.11.3

Which version of pip are you using?

23.0.1

Which libraries and their versions are you using?

`pip freeze` annotated-types==0.7.0 anyio==3.7.1 bytecode==0.15.1 certifi==2024.8.30 click==8.1.7 ddtrace==2.11.3 Deprecated==1.2.14 distro==1.9.0 envier==0.5.2 exceptiongroup==1.2.2 fastapi==0.104.1 gevent==24.2.1 greenlet==3.0.3 gunicorn==21.2.0 h11==0.14.0 httpcore==1.0.5 httptools==0.6.1 httpx==0.27.2 idna==3.8 importlib_metadata==8.4.0 openai==1.1.1 opentelemetry-api==1.27.0 packaging==24.1 protobuf==5.28.0 pydantic==2.9.0 pydantic_core==2.23.2 python-dotenv==1.0.1 PyYAML==6.0.2 sniffio==1.3.1 sse-starlette==1.6.5 starlette==0.27.0 tqdm==4.66.5 typing_extensions==4.12.2 tzdata==2024.1 uv==0.2.24 uvicorn==0.24.0.post1 uvloop==0.20.0 watchfiles==0.24.0 websockets==13.0.1 wrapt==1.16.0 xmltodict==0.13.0 zipp==3.20.1 zope.event==5.0 zope.interface==7.0.3

How can we reproduce your problem?

I am on py3.10 I created a venv, installed the above packages

My reproducer is:

(small_repro) (base) alec@alec-dev:~/small_repro$ cat dd_test.py 
from fastapi import FastAPI
import uvicorn
from sse_starlette import EventSourceResponse
import os

app = FastAPI()

@app.get('/foo')
def foo():
    def event_gen():
        for x in range(1,100):
            yield str(x)
    return EventSourceResponse(event_gen())

if __name__ == '__main__':
    uvicorn.run(app, host='0.0.0.0', port=int(os.getenv('PORT', 9005)))

What is the result that you get?

I open two terminals. One I use to run my reproducer:

ddtrace-run python dd_test.py

The second I use to curl the foo endpoint, we just throw the result away (it doesn't matter.)

curl localhost:9005/foo &>/dev/null

However, this triggers some odd cancellation exception:

(small_repro) (base) alec@alec-dev:~/small_repro$ ddtrace-run python dd_test.py
INFO:     Started server process [1981883]
INFO:     Waiting for application startup.
INFO:     Application startup complete.
INFO:     Uvicorn running on http://0.0.0.0:9005 (Press CTRL+C to quit)
INFO:     127.0.0.1:49796 - "GET /foo HTTP/1.1" 200 OK
Exception in callback <function TaskGroup._spawn.<locals>.task_done at 0x7dc79eca2170>
handle: <Handle TaskGroup._spawn.<locals>.task_done>
Traceback (most recent call last):
  File "/home/alec/small_repro/lib/python3.10/site-packages/sse_starlette/sse.py", line 236, in wrap
    await func()
  File "/home/alec/small_repro/lib/python3.10/site-packages/sse_starlette/sse.py", line 281, in _ping
    await anyio.sleep(self._ping_interval)
  File "/home/alec/small_repro/lib/python3.10/site-packages/anyio/_core/_eventloop.py", line 81, in sleep
    return await get_asynclib().sleep(delay)
  File "/home/alec/miniconda3/envs/py3.10/lib/python3.10/asyncio/tasks.py", line 605, in sleep
    return await future
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/home/alec/small_repro/lib/python3.10/site-packages/sse_starlette/sse.py", line 236, in wrap
    await func()
  File "/home/alec/small_repro/lib/python3.10/site-packages/sse_starlette/sse.py", line 211, in listen_for_exit_signal
    await AppStatus.should_exit_event.wait()
  File "/home/alec/small_repro/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 1778, in wait
    if await self._event.wait():
  File "/home/alec/miniconda3/envs/py3.10/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "uvloop/cbhandles.pyx", line 63, in uvloop.loop.Handle._run
  File "/home/alec/small_repro/lib/python3.10/site-packages/anyio/_backends/_asyncio.py", line 668, in task_done
    exc = _task.exception()
asyncio.exceptions.CancelledError
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/home/alec/small_repro/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 426, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/home/alec/small_repro/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 84, in __call__
    return await self.app(scope, receive, send)
  File "/home/alec/small_repro/lib/python3.10/site-packages/fastapi/applications.py", line 1106, in __call__
    await super().__call__(scope, receive, send)
  File "/home/alec/small_repro/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/home/alec/small_repro/lib/python3.10/site-packages/ddtrace/contrib/asgi/middleware.py", line 291, in __call__
    return await self.app(scope, receive, wrapped_send)
  File "/home/alec/small_repro/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/home/alec/small_repro/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/home/alec/small_repro/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 17, in __call__
    await self.app(scope, receive, send)
  File "/home/alec/small_repro/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/home/alec/small_repro/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/home/alec/small_repro/lib/python3.10/site-packages/starlette/routing.py", line 69, in app
    await response(scope, receive, send)
  File "/home/alec/small_repro/lib/python3.10/site-packages/sse_starlette/sse.py", line 247, in __call__
    await wrap(partial(self.listen_for_disconnect, receive))
  File "/home/alec/small_repro/lib/python3.10/site-packages/sse_starlette/sse.py", line 236, in wrap
    await func()
  File "/home/alec/small_repro/lib/python3.10/site-packages/sse_starlette/sse.py", line 191, in listen_for_disconnect
    message = await receive()
  File "/home/alec/small_repro/lib/python3.10/site-packages/uvicorn/protocols/http/httptools_impl.py", line 587, in receive
    await self.message_event.wait()
  File "/home/alec/miniconda3/envs/py3.10/lib/python3.10/asyncio/locks.py", line 214, in wait
    await fut
asyncio.exceptions.CancelledError

Some additional context:

sse-starlette uses a task-group in its call function to basically terminate event streaming if one of many things happens. You can see this behavior here:

Cancellation of this task-group is expected. However somehow something teh tracer is doing when gevent is installed causes this Cancellation propagation. I have not yet waddled through the gevent internal traces.

What is the result that you expected?

(small_repro) (me):~/small_repro$ ddtrace-run python dd_test.py INFO: Started server process [1981883] INFO: Waiting for application startup. INFO: Application startup complete. INFO: Uvicorn running on http://0.0.0.0:9005 (Press CTRL+C to quit) INFO: 127.0.0.1:49796 - "GET /foo HTTP/1.1" 200 OK

antarus12345 commented 2 months ago

Additional weirdness:

DD_TRACE_ENABLED=false ddtrace-run python dd_test.py also exhibits this failure, so its not even I think because of patch.all() like I expected, but is instead some of the nasty gevent stuff happening in sitecustomize?

antarus12345 commented 2 months ago

DD_UNLOAD_MODULES_FROM_SITECUSTOMIZE=no ddtrace-run python dd_test.py does not exhibit the bug.

antarus12345 commented 2 months ago

Ah I see now:

As part of module cleanup in sitecustomize auto cleanup is enabled if gevent module is installed. The bug appears to be that module cleanup triggers this problem (presumably by fussing with asyncio or _asyncio.)

So some Qs:

antarus12345 commented 2 months ago

and of course if I just edit sitecustomize.py (for fun) to remove gevent from MODULE_REQUIRES CLEANUP it also works fine. cool.

victorct-tradeengage commented 2 months ago

Same issue here, we have Gevent installed which we use with celery for the pool. The codebase between FastAPI and Celery is shared.

luciidlou commented 1 month ago

Same issue here with our FastAPI application. Removing gevent from our dependencies solved the issue.