DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
553 stars 415 forks source link

Unable to start application with Python 3.11.9 + gevent + ddtrace #8903

Open fbexiga opened 7 months ago

fbexiga commented 7 months ago

Summary of problem

When trying to start a Flask API using gunicorn + gevent + ddtrace + Python 3.11.9, the application crashes. However, if I use Python 3.11.8 instead or remove either gevent or ddtrace, it works. Also, I can only reproduce this issue on a Linux system (like Debian Bookworm), not on MacOS for instance.

Edit: it appears that even after downgrading to Py 3.11.8, with ddtrace 2.7.x the application doesn't start properly, although the error is different. With 2.6.x it does work as expected.

Which version of dd-trace-py are you using?

Tested 2.8.0, 2.7.7 and a few more down to 2.6.3

Which version of pip are you using?

Python 3.11.9 pip 24.0

Which libraries and their versions are you using?

ddtrace==2.7.7 flask==3.0.2 gevent==24.2.1 greenlet==3.0.3 gunicorn==21.2.0

How can we reproduce your problem?

If I try to start a Flask API using gunicorn with gevent workers + ddtrace + Python 3.11.9, i get the following error as soon as the worker boots:

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/threading.py", line 989, in _bootstrap
    # Wrapper around the real bootstrap code that ignores
  File "ddtrace/profiling/_threading.pyx", line 38, in ddtrace.profiling._threading.native_id_hook.bootstrap_wrapper
  File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.11/threading.py", line 1049, in _bootstrap_inner
    self._delete()
  File "/usr/local/lib/python3.11/threading.py", line 1081, in _delete
    del _active[get_ident()]
        ~~~~~~~^^^^^^^^^^^^^
KeyError: 139743514440832

What is the result that you get?

I am unable to start the application, getting the error mentioned above.

What is the result that you expected?

I expected the application to start and work just like it does with an older version of Python.

emmettbutler commented 7 months ago

Thanks for reporting this, @fbexiga. If turning off the Profiling functionality is an option for your use case, it's the first thing I'd recommend. Does the error still occur when you set DD_PROFILING_ENABLED=0?

emmettbutler commented 7 months ago

cc @sanchda

sanchda commented 7 months ago

@fbexiga, thank you so much for the thorough and insightful report. Unfortunately, I don't think we have a short-term workaround, but we'll try to get this resolved promptly.

fbexiga commented 7 months ago

That's ok, for now we just downgraded back to 3.11.8. No rush or anything, but I thought it was worth reporting.

I tried disabling profiling but still same result.

kc-experian commented 7 months ago

I have the same error in a Celery application using Python 3.11.9 + gevent + ddtrace

Traceback (most recent call last):
  File "src/gevent/_abstract_linkable.py", line 287, in gevent._gevent_c_abstract_linkable.AbstractLinkable._notify_links
  File "src/gevent/_abstract_linkable.py", line 333, in gevent._gevent_c_abstract_linkable.AbstractLinkable._notify_links
AssertionError: (None, <callback at 0x7fe8acaaa4c0 args=([],)>)
2024-04-09T21:13:55Z <callback at 0x7fe8acaaa4c0 args=([],)> failed with AssertionError
iherasymenko commented 7 months ago

Also affects Python 3.12.3; used to work just fine with 3.12.2.

askidelskiy commented 7 months ago

Encountered a similar exception, but we don't have profiling enabled. Also happened when moving from 3.11.8 to 3.11.9. Rolling back python version resolved the error.

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/threading.py", line 1002, in _bootstrap
    self._bootstrap_inner()
  File "/usr/local/lib/python3.11/threading.py", line 1049, in _bootstrap_inner
    self._delete()
  File "/usr/local/lib/python3.11/threading.py", line 1081, in _delete
    del _active[get_ident()]
        ~~~~~~~^^^^^^^^^^^^^
KeyError: 139737853141056

ddtrace==2.7.6 django==4.2.11 gevent==23.9.1 greenlet==3.0.3 gunicorn==21.2.0

P403n1x87 commented 7 months ago

There is no clear link between this issue and https://github.com/DataDog/dd-trace-py/pull/8870, but it might be worth testing it once it's released 🤞 . Meanwhile we'll see if we can reproduce this issue

lawrenceong commented 7 months ago

Was testing this and found that the crash did not happen when we are on an Intel Processor and crashes on AMD EPYC. Disabling ddtrace prevents it from crashing on AMD EPYC.

Intel processor: Intel(R) Xeon(R) CPU @ 2.20GHz AMD EPYC processor: AMD EPYC 7B12

Docker image = python:3.11.9-slim

ddtrace==2.8.2
flask=3.0.3
gevent==24.2.1
greenlet=3.0.3
gunicorn==22.0.0

Downgrading to python 3.11.8 stops the crash on AMD EPYC.

fbexiga commented 6 months ago

Any movement on this?

iherasymenko commented 5 months ago

~Reproducible with Python 3.12.4 + gevent 24.2.1 + greenlet 3.0.3 + ddtrace 2.9.0.~

~UPD 1: Only reproducible together with sentry-sdk.~

UPD 2: Reproducible without sentry-sdk. It was a red herring.

JASchilz commented 5 months ago

Also affects Python 3.12.3; used to work just fine with 3.12.2.

I likewise encountered a similar issue when using 3.12.3. Downgrading to 3.12.2 fixed the issue.

iherasymenko commented 5 months ago

I finally have a working reproducer: https://github.com/iherasymenko/ddtrace-8903-reproducer

Chasing it down required a machine with the AMD EPYC 7R13 processor (an AWS EC2 c6a.8xlarge VM) but it seems like the simplified version works fine both on my M3 MacBook Pro and my Intel Core i7 Linux machine.

ddtrace v2.10.0rc2 is still affected by the issue.

Also, in this particular example, disabling patching of mongoengine via DD_PATCH_MODULES="mongoengine:false" helps but this is not really an option as the other enabled integrations will cause the similar effect.

ffernand commented 5 months ago

I've also been having these issues and noticed a gevent issue showing that it's not compatible with 3.11.9. It further points to a cpython issue about the import of the threading library that happens before gevent has a chance to patch it.

There's a PR open to address this and I've tried the patch locally and I was able to get ddtrace-run & gevent to play nice on 3.11.9 https://github.com/python/cpython/pull/120233

This looks to be an issue strictly with cpython on the latest patch series for 3.11 and 3.12.

EDIT: spelling

lawrenceong commented 4 months ago

Even though https://github.com/python/cpython/pull/120233 is already merged, it looks like it will not be backported to 3.11 as it is not considered a security fix (https://github.com/python/cpython/pull/120233#issuecomment-2207215913).

It is however, ported to 3.12 / 3.13, so it looks like we will need to upgrade unless there is a plan for gevent to update their code.

iherasymenko commented 4 months ago

The issue is fixed in 2.10.0 and 2.9.4 🎉