awestlake87 / pyo3-asyncio

Other
300 stars 45 forks source link

asyncio + uvloop task being destroyed on heavy logging #104

Open sugarraysam opened 1 year ago

sugarraysam commented 1 year ago

πŸ› Bug Reports

When reporting a bug, please provide the following information. If this is not a bug report you can just discard this template.

🌍 Environment

πŸ’₯ Reproducing

Please provide a minimal working example. This means both the Rust code and the Python.

Please also write what exact flags are required to reproduce your results.

I apologize in advance as I can't share too much code here. Will try to share the important parts so we can understand the issue.

Issue

When using python asyncio with uvloop as EventLoopPolicy, with verbose debug logging, and turning on pyo3 bindings (they are behind a gate keeper), we start seeing on the server side:

# 1. uvloop call stack errors
Traceback (most recent call last):
  File "uvloop/loop.pyx", line 1287, in uvloop.loop.Loop.call_soon_threadsafe
  File "../../../../__cython__cython-includes__/out/cbhandles.pyx", line 328, in uvloop.loop.new_Handle
  File "../../../../__cython__cython-includes__/out/cbhandles.pyx", line 15, in uvloop.loop.Handle._set_loop
  File "../../../../__cython__cython-includes__/out/cbhandles.pyx", line 416, in uvloop.loop.extract_stack
ValueError: call stack is not deep enough

# 2. asyncio task being destroyed
task: <RequestContextAwareTask pending name='Task-28' coro=<<coroutine without __name__>()> wait_for=<Future pending cb=[<builtins.PyDoneCallback object at 0x7fa112917370>(), RequestContextAwareTask.task_wakeup()] created at /<nip>/lib/python3.10/asyncio/runners.py:44>
2023-09-06 10:25:21.246:ERROR:asyncio:traceback.py:extract:L376: Task was destroyed but it is pending!
source_traceback: Object created at (most recent call last):
  File "<string>", line 51, in <module>
  File "<string>", line 37, in __run
  File "/<nip>/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/<nip>/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/<nip>/server.py", line 926, in <module>
    asyncio.run(main())
  File "/<nip>/click/core.py", line 1130, in __call__
    return self.main(*args, **kwargs)
  File "/<nip>/click/core.py", line 1055, in main
    rv = self.invoke(ctx)
  File "/<nip>/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/<nip>/click/core.py", line 760, in invoke
    return __callback(*args, **kwargs)
  File "/<nip>/click/decorators.py", line 26, in new_func
    return f(get_current_context(), *args, **kwargs)
  File "/<nip>/python3.10/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
task: <RequestContextAwareTask pending name='Task-29' coro=<<coroutine without __name__>()> wait_for=<Future pending cb=[<builtins.PyDoneCallback object at 0x7fa112917810>(), RequestContextAwareTask.task_wakeup()] created at /<nip>/authorization.py:541> created at /<nip>/lib/python3.10/asyncio/runners.py:44>

Now we only see this error when enabling the pyo3 bindings, i.e. turning the gatekeeper on. It's as if the python logger is unaware of pyo3_asyncio tasks or somehow trying to pull stackframes that don't exist?

What is interesting is that turning logging verbosity off makes the issue go away, i.e., server works with pyo3 bindings on.

What is also interesting is there is no pattern in which coroutine get destroyed, meaning, not only pyo3_asyncio coroutines are being terminated prematurely. To me it makes it more likely that it could be a performance/CPU issue. Could it be that pyo3_asyncio created coroutines somehow require a lot of CPU? I do see huge spikes but can't say it's more than with the bindings being off. (obviously we cant reproduce this here unfortunately...)

Python setup

# Run server with uvloop
asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())
asyncio.run(main())

# logging setup
DEBUG_LOG_FORMAT = (
    "%(asctime)s:%(levelname)s:%(name)s:%(funcName)s:L%(lineno)d: %(message)s"
)

logging.basicConfig(
    level=logging.DEBUG,
    format=DEBUG_LOG_FORMAT,
    datefmt="%Y-%m-%d %H:%M:%S",
)

Rust setup

# all futures are converted to coroutines like so
# note we use `tracing::instrument` and attach it to all futures
#[instrument(target = "<target>", skip_all)]
pub fn process_request<'py>(
    &self,
    py: Python<'py>,
    ids: &PyAny,
    req: &PyAny,
) -> PyResult<&'py PyAny> {
    self.log_common_fields(req)?;
    let inner = Arc::clone(&self.inner);

    pyo3_asyncio::tokio::future_into_py(
        py,
        async move {
            inner.foo().map_err(to_pyerr)
        }
        .in_current_span(),
    )
}

Appreciate any insight or troubleshooting idea. Thank you.