criteo / biggraphite

Simple Scalable Time Series Database
Apache License 2.0
129 stars 36 forks source link

Deadlock #547

Open Roguelazer opened 4 years ago

Roguelazer commented 4 years ago

We recently upgraded from biggraphite 0.14.10 to biggraphite 0.14.17 and we're now seeing deadlocks in production sometimes where all the threads are hung on a futex. GDB says it's coming from lock_PyThread_acquire_lock, so I'm pretty confident it's an explicit lock from Python code, not an implicit lock in the CPython runtime.

invoke_gc_callback does appear in a lot of the tracebacks; is there anywhere a lock or monitor might be acquired during a __del__?

Unfortunately, I can't get the fancy py-bt GDB integration for the Python build this is using, so I can't go too much deeper. Here's the first 40 stack frames of a stuck process:

#0  0x00007f3501d64af1 in sem_timedwait () from /lib64/libpthread.so.0
#1  0x00007f35020e56e1 in PyThread_acquire_lock_timed (lock=0x7f34e8015b60, microseconds=999999, intr_flag=1) at Python/thread_pthread.h:382
#2  0x00007f35020eafa4 in acquire_timed (lock=0x7f34e8015b60, timeout=999998868) at Modules/_threadmodule.c:68
#3  0x00007f35020eb0e6 in lock_PyThread_acquire_lock (self=0x7f34990c2ee0, args=<value optimized out>, kwds=<value optimized out>) at Modules/_threadmodule.c:151
#4  0x00007f350202ad72 in _PyCFunction_FastCallDict (func_obj=0x7f34990c9750, args=0x7f34a09dc510, nargs=<value optimized out>, kwargs=0x0) at Objects/methodobject.c:231
#5  0x00007f35020a13e2 in call_function (pp_stack=0x7f34f0449998, oparg=<value optimized out>, kwnames=0x0) at Python/ceval.c:4851
#6  0x00007f35020a4b65 in _PyEval_EvalFrameDefault (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:3335
#7  0x00007f35020a0c63 in _PyEval_EvalCodeWithName (_co=0x7f34fafa81e0, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=2,
    kwnames=0x0, kwargs=0x7f34e4024348, kwcount=0, kwstep=1, defs=0x7f34fafa2b50, defcount=1, kwdefs=0x0, closure=0x0, name=0x7f35025cd570, qualname=0x7f34fafa4630)
    at Python/ceval.c:4166
#8  0x00007f35020a1107 in fast_function (func=<value optimized out>, stack=<value optimized out>, nargs=<value optimized out>, kwnames=<value optimized out>) at Python/ceval.c:4992
#9  0x00007f35020a12fa in call_function (pp_stack=0x7f34f0449c68, oparg=<value optimized out>, kwnames=0x0) at Python/ceval.c:4872
#10 0x00007f35020a4b65 in _PyEval_EvalFrameDefault (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:3335
#11 0x00007f35020a0c63 in _PyEval_EvalCodeWithName (_co=0x7f34f4fe2270, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=3,
    kwnames=0x0, kwargs=0x7f34e8001928, kwcount=0, kwstep=1, defs=0x7f34f52814a0, defcount=2, kwdefs=0x0, closure=0x0, name=0x7f3502635490, qualname=0x7f34f4fe3ef0)
    at Python/ceval.c:4166
#12 0x00007f35020a1107 in fast_function (func=<value optimized out>, stack=<value optimized out>, nargs=<value optimized out>, kwnames=<value optimized out>) at Python/ceval.c:4992
#13 0x00007f35020a12fa in call_function (pp_stack=0x7f34f0449f38, oparg=<value optimized out>, kwnames=0x0) at Python/ceval.c:4872
#14 0x00007f35020a4b65 in _PyEval_EvalFrameDefault (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:3335
#15 0x00007f350209fefc in _PyFunction_FastCall (co=<value optimized out>, args=<value optimized out>, nargs=1, globals=0x7f34f5260ea0) at Python/ceval.c:4933
#16 0x00007f35020a12fa in call_function (pp_stack=0x7f34f044a118, oparg=<value optimized out>, kwnames=0x0) at Python/ceval.c:4872
#17 0x00007f35020a4b65 in _PyEval_EvalFrameDefault (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:3335
#18 0x00007f350209fefc in _PyFunction_FastCall (co=<value optimized out>, args=<value optimized out>, nargs=1, globals=0x7f34f5260ea0) at Python/ceval.c:4933
#19 0x00007f35020a1042 in _PyFunction_FastCallDict (func=0x7f34f14a02f0, args=0x7f34f044a2d0, nargs=1, kwargs=0x7f34f0459510) at Python/ceval.c:5035
#20 0x00007f3501fe181b in _PyObject_FastCallDict (func=0x7f34f14a02f0, args=0x7f34f044a2d0, nargs=1, kwargs=0x7f34f0459510) at Objects/abstract.c:2310
#21 0x00007f3501fe24d1 in _PyObject_Call_Prepend (func=0x7f34f14a02f0, obj=0x7f34f1177f60, args=0x7f3502625048, kwargs=0x7f34f0459510) at Objects/abstract.c:2373
#22 0x00007f3501fe25a4 in PyObject_Call (func=0x7f34f05bc648, args=0x7f3502625048, kwargs=0x7f34f0459510) at Objects/abstract.c:2261
#23 0x00007f35020a4e0e in do_call_core (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:5120
#24 _PyEval_EvalFrameDefault (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:3404
#25 0x00007f35020a0c63 in _PyEval_EvalCodeWithName (_co=0x7f34f60b7810, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=3,
    kwnames=0x7f3502625060, kwargs=0x7f3502625068, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7f34f61272f0, qualname=0x7f34f612f850)
    at Python/ceval.c:4166
#26 0x00007f35020a0ef3 in _PyFunction_FastCallDict (func=0x7f34f612c7b8, args=0x7f34f044a6c0, nargs=3, kwargs=0x7f34f04593f0) at Python/ceval.c:5084
#27 0x00007f3501fe181b in _PyObject_FastCallDict (func=0x7f34f612c7b8, args=0x7f34f044a6c0, nargs=3, kwargs=0x7f34f04593f0) at Objects/abstract.c:2310
#28 0x00007f3501fe24d1 in _PyObject_Call_Prepend (func=0x7f34f612c7b8, obj=0x7f34f04c0d30, args=0x7f34f04bb948, kwargs=0x7f34f04593f0) at Objects/abstract.c:2373
#29 0x00007f3501fe25a4 in PyObject_Call (func=0x7f34f054e048, args=0x7f34f04bb948, kwargs=0x7f34f04593f0) at Objects/abstract.c:2261
#30 0x00007f35020a4e0e in do_call_core (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:5120
#31 _PyEval_EvalFrameDefault (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:3404
#32 0x00007f35020a0c63 in _PyEval_EvalCodeWithName (_co=0x7f34f60b7ae0, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=3,
    kwnames=0x7f3502625060, kwargs=0x7f3502625068, kwcount=0, kwstep=2, defs=0x0, defcount=0, kwdefs=0x0, closure=0x0, name=0x7f34f61272f0, qualname=0x7f34f611bea0)
    at Python/ceval.c:4166
#33 0x00007f35020a0ef3 in _PyFunction_FastCallDict (func=0x7f34f612c9d8, args=0x7f34f044aab0, nargs=3, kwargs=0x7f34f05aad38) at Python/ceval.c:5084
#34 0x00007f3501fe181b in _PyObject_FastCallDict (func=0x7f34f612c9d8, args=0x7f34f044aab0, nargs=3, kwargs=0x7f34f05aad38) at Objects/abstract.c:2310
#35 0x00007f3501fe24d1 in _PyObject_Call_Prepend (func=0x7f34f612c9d8, obj=0x7f34f60b5a20, args=0x7f34f04a7388, kwargs=0x7f34f05aad38) at Objects/abstract.c:2373
#36 0x00007f3501fe25a4 in PyObject_Call (func=0x7f34fafff9c8, args=0x7f34f04a7388, kwargs=0x7f34f05aad38) at Objects/abstract.c:2261
#37 0x00007f35020a4e0e in do_call_core (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:5120
#38 _PyEval_EvalFrameDefault (f=<value optimized out>, throwflag=<value optimized out>) at Python/ceval.c:3404
#39 0x00007f35020a0c63 in _PyEval_EvalCodeWithName (_co=0x7f34f06089c0, globals=<value optimized out>, locals=<value optimized out>, args=<value optimized out>, argcount=0,
    kwnames=0x0, kwargs=0x7f34e8001450, kwcount=0, kwstep=1, defs=0x0, defcount=0, kwdefs=0x0, closure=0x7f34f05c9908, name=0x7f35024e8bf0, qualname=0x7f34f05d8e00)
    at Python/ceval.c:4166
#40 0x00007f35020a1107 in fast_function (func=<value optimized out>, stack=<value optimized out>, nargs=<value optimized out>, kwnames=<value optimized out>) at Python/ceval.c:4992

Any thoughts on what to investigate?