getsentry / sentry-python

The official Python SDK for Sentry.io
https://sentry.io/for/python/
MIT License
1.88k stars 494 forks source link

SDK not instrumenting Asyncio calls when `sentry_sdk.init` called at app lifecycle's beginning, outside an `async` function #2328

Open szokeasaurusrex opened 1 year ago

szokeasaurusrex commented 1 year ago

How do you use Sentry?

Sentry Saas (sentry.io)

Version

1.30.0

Steps to Reproduce

To observe the issue, we can run the following code snippet:


import asyncio
import time

import sentry_sdk
from sentry_sdk.integrations.asyncio import AsyncioIntegration

sentry_sdk.init(
    dsn="[your dsn here]",
    traces_sample_rate=1.0,
    integrations=(AsyncioIntegration(),),
)

async def fib(n):
    if n == 0:
        return 0

    if n == 1:
        return 1

    time.sleep(0.05)

    async with asyncio.TaskGroup() as tg:
        fib1 = tg.create_task(fib(n - 1))
        fib2 = tg.create_task(fib(n - 2))

    return fib1.result() + fib2.result()

async def main():
    with sentry_sdk.start_transaction(op="test", name="Fibonacci"):
        result = await fib(5)

    print(result)

if __name__ == "__main__":
    asyncio.run(main())

Expected Result

We would expect the SDK to record spans for all async calls. In particular, calls to the function fib should appear as spans in the performance transaction, so that we observe the following when viewing the performance transaction in Sentry:

image

Actual Result

However, if we actually run the above code snippet and then view the generated performance transaction in Sentry, we will observe that the fib spans are missing! Instead, our transaction looks like the following:

image

The missing spans indicate that the SDK is not instrumenting async task calls as we would expect it to.

Workaround

For now, we can work around this bug by initializing the SDK within our first async function call, like so:


import asyncio
import time

import sentry_sdk
from sentry_sdk.integrations.asyncio import AsyncioIntegration

async def fib(n):
    if n == 0:
        return 0

    if n == 1:
        return 1

    time.sleep(0.05)

    async with asyncio.TaskGroup() as tg:
        fib1 = tg.create_task(fib(n - 1))
        fib2 = tg.create_task(fib(n - 2))

    return fib1.result() + fib2.result()

async def main():
    # Note that the SDK init call has been moved here!
    sentry_sdk.init(
        dsn="[your dsn here]",
        traces_sample_rate=1.0,
        integrations=(AsyncioIntegration(),),
    )
    with sentry_sdk.start_transaction(op="test", name="Fibonacci"):
        result = await fib(5)

    print(result)

if __name__ == "__main__":
    asyncio.run(main())

If we run the workaround code, we observe the expected result. However, we should fix the bug so that we also observe the expected result when we initialize the SDK outside an async function call.

Possible fix

The issue appears to arise from within the patch_asyncio function, which is defined in sentry_sdk/integrations/asyncio.py, specifically by the line which obtains the loop like so:

loop = asyncio.get_running_loop()

asyncio.get_running_loop can only be called from within a coroutine or callback. When we initialize the SDK outside an async function, we violate this precondition and a RuntimeError is raised. The SDK handles this error, but the error prevents the SDK from auto-instrumenting calls to async functions.

Replacing the asyncio.get_running_loop() call with a call to asyncio.get_event_loop() would likely fix this bug, but we need to ensure that get_event_loop does not cause any unintended side effects before implementing this change.

Alternatively, this issue could likely be resolved by instrumenting asyncio through the use of a custom event loop policy which instruments the SDK when a new event loop is created in addition to instrumenting the current event loop. This solution may be preferable since it would likely also resolve #2333.

szokeasaurusrex commented 1 year ago

See also issue #2333, which is closely related to this issue

sentrivana commented 3 months ago

Potential solution: In addition to patching the async task factory directly (backwards compat), patch something higher up so that any event loop that starts gets the patched async factory.