aeppert / yappi

Automatically exported from code.google.com/p/yappi
MIT License
0 stars 0 forks source link

random deadlock when starting profiling in the same time a thread is started #48

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
What steps will reproduce the problem?

1. Start a thread and yappi in the same time
   In the reproducer, we are starting a thread running a function
   decorated with vdsm's yappi profiler decorator.

2. If running enough times, when the thread is trying to acquire
   the _active_limbo_lock, yappi get a profile callback for the 
   acquire call.
   The callback calls threading.currentThread which try to acquire
   the _active_limbo_lock again

3. deadlock

What version of the product are you using? On what operating system?
0.82, 0.92

Please provide any additional information below.

Run the attached reproducer in a loop:
for n in `seq 100`; do python deadlock_test.py; done

Attached files:

deadlock/deadlock_test.py-bt ..... output of gdb thread apply all py-bt
deadlock/pthread.py .............. part of pthreading
deadlock/deadlock_test.bt ........ output of gdb thread apply all bt
deadlock/deadlock_test.bt-full ... output of gdb thread apply all bt full
deadlock/pthreading.py ........... part of pthreading package
deadlock/deadlock_test.py ........ reproducer unittest
deadlock/core.13461 .............. example core dump from a deadlock
deadlock/profile.py .............. stripped down vdsm's profile module

I can reproduce this only when using pthreading library:
https://pypi.python.org/pypi/pthreading
For convenience, I included pthreading files in the tarball.

If you comment out pthreading.monkey_patch() line, I did not see any
deadlock in 1000 runs.

pthreading replace threading Lock, RLock and Condition with thin wrappers
around pthread_mutex_xxx and pthread_cond_xxx functions using ctypes.
The semantics should be similar enough, but the timings is different.

I believe this race is possible also with standard library threading,
but you probably need different timing.

Seems that yappi should not call back into python for getting the current
thread class name - this is racy.

Original issue reported on code.google.com by nir...@gmail.com on 13 Aug 2014 at 8:02

Attachments:

GoogleCodeExporter commented 9 years ago
Ugly workaround - monkeypatch threading._active_limbo_lock:

  import threading
  threading._active_limbo_lock = threading.RLock()

With this I can run deadlock_test.py 1000 times without deadlock.

Original comment by nir...@gmail.com on 13 Aug 2014 at 8:34

GoogleCodeExporter commented 9 years ago
Details:

1. thread starts and try to lock _active_limbo_lock
2. yappi get a profile callback for _active_limbo_lock.acquire
3. yappi call threading.currentThread
4. threading.currentThread creates a _DummyThread
5. _DummyThread.__init__ try to lock _active_limbo_lock

So even if we do get the class name, it is wrong, since the real class
did not register itself into _active dictionary.

Original comment by nir...@gmail.com on 13 Aug 2014 at 8:41

GoogleCodeExporter commented 9 years ago
Excellent investigation. Need some time tosee if we can workaround this i agree 
that trying to get the thread class name is hecky and open to races. Let me see 
what we can doo. Again: great issue report.

Original comment by sum...@gmail.com on 13 Aug 2014 at 11:12

GoogleCodeExporter commented 9 years ago
Fixed this with the help of nirsof. Fix available in commit 009a272.

Original comment by sum...@gmail.com on 2 Sep 2014 at 9:19