DataDog / dd-trace-py

Datadog Python APM Client
https://ddtrace.readthedocs.io/
Other
551 stars 412 forks source link

segfault with DD_PROFILING_ENABLED=true ddtrace-run in Python 3.12 #7533

Closed jukeks closed 11 months ago

jukeks commented 1 year ago

Summary of problem

Running uvicorn through ddtrace-run with profiling enabled segfaults

$ DD_PROFILING_ENABLED=true ddtrace-run uvicorn --help
Segmentation fault (core dumped)

Also same happens with gunicorn --help.

And core's bt looks like

Core was generated by `/opt/pysetup/.venv/bin/python /opt/pysetup/.venv/bin/uvicorn --help'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  __new_sem_wait_fast (definitive_result=0, sem=0x0) at ./nptl/sem_waitcommon.c:136
136     ./nptl/sem_waitcommon.c: No such file or directory.
[Current thread is 1 (Thread 0x7f610dfae6c0 (LWP 294))]
(gdb) bt
#0  __new_sem_wait_fast (definitive_result=0, sem=0x0) at ./nptl/sem_waitcommon.c:136
#1  __new_sem_wait (sem=sem@entry=0x0) at ./nptl/sem_wait.c:39
#2  0x00007f6110f5d8e6 in PyThread_acquire_lock_timed (lock=0x0, microseconds=<optimized out>, intr_flag=0) at Python/thread_pthread.h:486
#3  0x00007f610e18cca4 in __pyx_f_7ddtrace_9profiling_9collector_5stack_collect_threads (__pyx_v_thread_span_links=0x7f610e8f7e90, __pyx_v_thread_time=0x7f610edfb8e0, 
    __pyx_v_thread_id_ignore_list=0x7f610e7efca0) at ddtrace/profiling/collector/stack.c:6520
#4  __pyx_f_7ddtrace_9profiling_9collector_5stack_stack_collect (__pyx_v_ignore_profiler=__pyx_v_ignore_profiler@entry=0x7f611128b260 <_Py_FalseStruct>, 
    __pyx_v_thread_time=__pyx_v_thread_time@entry=0x7f610edfb8e0, __pyx_v_max_nframes=__pyx_v_max_nframes@entry=0x7f611128c908 <_PyRuntime+5320>, 
    __pyx_v_interval=__pyx_v_interval@entry=0x7f610eb7f190, __pyx_v_wall_time=__pyx_v_wall_time@entry=0x7f610e83df90, __pyx_v_thread_span_links=__pyx_v_thread_span_links@entry=0x7f610e8f7e90, 
    __pyx_v_collect_endpoint=0x7f611128b420 <_Py_TrueStruct>) at ddtrace/profiling/collector/stack.c:7239
#5  0x00007f610e19d60b in __pyx_pf_7ddtrace_9profiling_9collector_5stack_14StackCollector_10collect (__pyx_self=<optimized out>, __pyx_v_self=<optimized out>)
    at ddtrace/profiling/collector/stack.c:11861
#6  __pyx_pw_7ddtrace_9profiling_9collector_5stack_14StackCollector_11collect (__pyx_self=<optimized out>, __pyx_args=<optimized out>, __pyx_nargs=<optimized out>, __pyx_kwds=<optimized out>)
    at ddtrace/profiling/collector/stack.c:11700
#7  0x00007f6110f2ee8f in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775809, args=0x7f610d7aa250, callable=0x7f610e894520, tstate=0x55d7ca562580)
    at ./Include/internal/pycore_call.h:92
#8  PyObject_Vectorcall (callable=0x7f610e894520, args=0x7f610d7aa250, nargsf=9223372036854775809, kwnames=0x0) at Objects/call.c:325
#9  0x00007f6110f1c273 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/bytecodes.c:2711
#10 0x00007f610e1e9069 in __Pyx_PyObject_Call (kw=0x7f610dfb84c0, arg=0x7f610f9e0490, func=<optimized out>) at ddtrace/profiling/_threading.c:7816
#11 __pyx_pf_7ddtrace_9profiling_10_threading_14native_id_hook_bootstrap_wrapper (__pyx_self=<optimized out>, __pyx_v_kwargs=<optimized out>, __pyx_v_args=<optimized out>, 
    __pyx_v_f=<optimized out>) at ddtrace/profiling/_threading.c:3824
#12 __pyx_pw_7ddtrace_9profiling_10_threading_14native_id_hook_1bootstrap_wrapper (__pyx_self=<optimized out>, __pyx_args=<optimized out>, __pyx_nargs=<optimized out>, __pyx_kwds=<optimized out>)
    at ddtrace/profiling/_threading.c:3757
#13 0x00007f6110f2ee8f in _PyObject_VectorcallTstate (kwnames=0x0, nargsf=9223372036854775811, args=0x7f610d7aa080, callable=0x7f610ea2e740, tstate=0x55d7ca562580)
    at ./Include/internal/pycore_call.h:92
#14 PyObject_Vectorcall (callable=0x7f610ea2e740, args=0x7f610d7aa080, nargsf=9223372036854775811, kwnames=0x0) at Objects/call.c:325
#15 0x00007f6110f1c273 in _PyEval_EvalFrameDefault (tstate=<optimized out>, frame=<optimized out>, throwflag=<optimized out>) at Python/bytecodes.c:2711
#16 0x00007f6110f49c79 in _PyFunction_Vectorcall (kwnames=0x0, nargsf=1, stack=0x7f610dfade28, func=0x7f611044cc20) at Objects/call.c:419
#17 _PyObject_VectorcallTstate (kwnames=0x0, nargsf=1, args=0x7f610dfade28, callable=0x7f611044cc20, tstate=0x55d7ca562580) at ./Include/internal/pycore_call.h:92
#18 method_vectorcall (method=<optimized out>, args=0x7f611129de40 <_PyRuntime+76288>, nargsf=0, kwnames=0x0) at Objects/classobject.c:69
#19 0x00007f6110ea4e87 in thread_run (boot_raw=0x7f610ea565b0) at ./Modules/_threadmodule.c:1082
#20 0x00007f6110e62c32 in pythread_wrapper (arg=<optimized out>) at Python/thread_pthread.h:233
#21 0x00007f6110b83044 in start_thread (arg=<optimized out>) at ./nptl/pthread_create.c:442
#22 0x00007f6110c0361c in clone3 () at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:81
(gdb) 

Which version of dd-trace-py are you using?

$ ddtrace-run --version
ddtrace-run 2.1.5

Which version of pip are you using?

$ pip --version
pip 23.3.1 from /opt/pysetup/.venv/lib/python3.12/site-packages/pip (python 3.12)

Which libraries and their versions are you using?

`pip freeze` aioprometheus==23.3.0 annotated-types==0.6.0 anyio==3.7.1 attrs==23.1.0 Babel==2.13.1 bytecode==0.15.1 cattrs==23.1.2 certifi==2023.7.22 cffi==1.16.0 charset-normalizer==3.3.2 click==8.1.7 confluent-kafka==2.3.0 cryptography==41.0.5 datadog==0.47.0 ddsketch==2.0.4 ddtrace==2.1.5 Deprecated==1.2.14 ecs-logging==2.1.0 envier==0.4.0 exceptiongroup==1.1.3 fastapi==0.104.1 grpc-interceptor==0.15.3 grpcio==1.59.2 grpcio-health-checking==1.59.2 grpcio-reflection==1.59.2 gunicorn==21.2.0 h11==0.14.0 hiredis==2.2.3 httpcore==1.0.1 httptools==0.6.1 httpx==0.25.1 hvac==2.0.0 idna==3.4 importlib-metadata==6.8.0 langcodes==3.3.0 more-itertools==10.1.0 opentelemetry-api==1.21.0 orjson==3.9.10 packaging==23.2 protobuf==4.25.0 py-moneyed==3.0 pycparser==2.21 pydantic==2.4.2 pydantic-settings==2.0.3 pydantic_core==2.10.1 PyJWT==2.8.0 python-dotenv==1.0.0 pytz==2023.3.post1 quantile-python==1.1 redis==5.0.1 requests==2.31.0 sentry-sdk==1.34.0 setuptools==68.2.2 six==1.16.0 sniffio==1.3.0 starlette==0.27.0 statsd==4.0.1 timing-asgi==0.3.1 typing_extensions==4.8.0 urllib3==2.0.7 uvicorn==0.24.0.post1 uvloop==0.19.0 wattr==0.1.33

How can we reproduce your problem?

With Python 3.12 run

$ DD_PROFILING_ENABLED=true ddtrace-run uvicorn --help

What is the result that you get?

Segfault.

What is the result that you expected?

Help printout.

jukeks commented 1 year ago

https://github.com/DataDog/dd-trace-py/blob/92425e841ca155e4df704f238d9e26deb7bb05d9/ddtrace/profiling/collector/stack.pyx#L235:L246

    IF UNAME_SYSNAME != "Windows" and PY_VERSION_HEX >= 0x03070000:
        cdef PyInterpreterState* interp
        cdef PyThreadState* tstate
        cdef _PyErr_StackItem* exc_info
        cdef PyThread_type_lock lmutex = _PyRuntime.interpreters.mutex
        cdef PyObject* exc_type
        cdef PyObject* exc_tb
        cdef dict running_threads = {}

        # This is an internal lock but we do need it.
        # See https://bugs.python.org/issue1021318
        if PyThread_acquire_lock(lmutex, WAIT_LOCK) == PY_LOCK_ACQUIRED:

Looks like something has changed since 3.11 as the mutex is now NULL. And it makes sense since 3.12 introduced isolated subinterpreters https://peps.python.org/pep-0684/ .

sanchda commented 1 year ago

@jukeks many thanks for the report. Usually for segmentation faults, we ask customers to go through our support portal (you may still do so if you like), but I think you've given ample information here.

Looking into it.

P403n1x87 commented 1 year ago

@jukeks thanks for the report and the analysis! Since that lock is required for the current implementation of the collect_threads function to work, one possible (temporary) solution could be to adopt a pure-Python implementation for 3.12, like the one in https://github.com/P403n1x87/dd-trace-py/blob/d897bfd6b1b1a9ee9c6804276efe2a7b3b07797c/ddtrace/profiling/collector/stack.py.

k4nar commented 11 months ago

We have encountered this issue as well with Python 3.12. This is preventing us to migrate to this version of Python at the moment.

aebrahim commented 11 months ago

Any update on this bug? It's blocking our update to python 3.12 as well

P403n1x87 commented 11 months ago

We have just merged https://github.com/DataDog/dd-trace-py/pull/7822 in the main branch and opened backports as far back as 2.1. We should have this fix released with the next patch releases in the coming days.

P403n1x87 commented 11 months ago

@jukeks @k4nar @aebrahim the fix is available in v2.3.2, as well as in the latest 2.2 and 2.1 releases. It would be great if you could test that to confirm that the issue has been resolved. Many thanks 🙏

aebrahim commented 11 months ago

Fixed for me! Thank you!