bdarnell / plop

Python Low-Overhead Profiler
MIT License
922 stars 69 forks source link

KeyError when adding to stack #6

Open diogobaeder opened 11 years ago

diogobaeder commented 11 years ago

I think this is somehow related to the fact that I'm using plop with a multithreaded application (which runs under Twisted), but I'm getting these errors every once in a while, any ideas what it might be?:

Unhandled Error
Traceback (most recent call last):
  File "/home/diogobaeder/some-company/some-project/profiling-support/utilities/serve_wsgi_with_twisted.py", line 43, in <module>
    sys.exit(main())
  File "/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/some-project/wsgi/twisted.py", line 1704, in main
    return run_worker(options, config)
  File "/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/some-project/wsgi/twisted.py", line 1684, in run_worker
    reactor.run()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1169, in run
    self.mainLoop()
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1181, in mainLoop
    self.doIteration(t)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/epollreactor.py", line 181, in doPoll
    l = self._poller.wait(len(self._selectables), timeout)
  File "/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/plop/collector.py", line 65, in handler
    self.stack_counts[tuple(frames)] += 1
exceptions.KeyError: (('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/utils/encoding.py', 104, 'smart_str'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/utils/encoding.py', 132, 'iri_to_uri'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/core/urlresolvers.py', 422, 'reverse'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/defaulttags.py', 395, 'render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/debug.py', 72, 'render_node'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/base.py', 819, 'render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/loader_tags.py', 48, 'render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/debug.py', 72, 'render_node'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/base.py', 819, 'render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/base.py', 133, '_render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/base.py', 136, 'render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/loader_tags.py', 131, 'render_template'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/loader_tags.py', 152, 'render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/debug.py', 72, 'render_node'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/base.py', 819, 'render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/base.py', 133, '_render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/loader_tags.py', 100, 'render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/debug.py', 72, 'render_node'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/base.py', 819, 'render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/base.py', 133, '_render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/loader_tags.py', 100, 'render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/debug.py', 72, 'render_node'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/base.py', 819, 'render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/base.py', 133, '_render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/base.py', 136, 'render'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/template/loader.py', 158, 'render_to_string'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/shortcuts/__init__.py', 14, 'render_to_response'), ('/home/diogobaeder/some-company/some-project/profiling-support/servers/u1servers/web/admin/storage/views.py', 24, 'view_index'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/some-project/account/djangolib/decorators.py', 91, '__call__'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/core/handlers/base.py', 72, 'get_response'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/django/core/handlers/wsgi.py', 210, '__call__'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/some-project/wsgi/handlers.py', 40, '__call__'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/some-project/wsgi/middleware/bzrheader.py', 16, '__call__'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/storm/wsgi.py', 49, 'wrapper'), ('/usr/lib/pymodules/python2.7/timeline/wsgi.py', 27, 'wrapper'), ('/usr/lib/python2.7/dist-packages/oops_wsgi/middleware.py', 103, 'oops_middleware'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/some-project/wsgi/middleware/opstats.py', 29, '__call__'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/some-project/wsgi/middleware/meliae_middleware.py', 19, '__call__'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/some-project/wsgi/twisted.py', 597, 'run'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/some-project/wsgi/twisted.py', 183, 'wrapper'), ('/usr/lib/python2.7/dist-packages/twisted/python/context.py', 61, 'callWithContext'), ('/usr/lib/python2.7/dist-packages/twisted/python/context.py', 117, 'callWithContext'), ('/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py', 193, '_worker'), ('/usr/lib/python2.7/threading.py', 501, 'run'), ('/usr/lib/python2.7/threading.py', 533, '__bootstrap_inner'), ('/usr/lib/python2.7/threading.py', 510, '__bootstrap'))
bdarnell commented 11 years ago

Hmm. Plop should be thread-safe; I initially developed it in a multithreaded app. statck_counts is normally a defaultdict so it should never raise KeyErrors (barring threading issues). Collector.filter() converts stack_counts into a regular dict (effectively making the object read-only until it is reset()). Could you be calling filter before the collector has stopped or reusing a collector without resetting it?

diogobaeder commented 11 years ago

Not that I know of; I'm controlling the Collector from the Twisted reactor loop (main thread), and I've put a simple lock to avoid it starting or stopping twice (only starts if not started already, only stops if running). The filter is run right after the collector is stopped, in the same function, also in the reactor loop.

Do you have visibility enough to the signal handler, in order to know when there's an error while collecting, in your applications?

bdarnell commented 11 years ago

If the signal handler fails, the exception will keep going into whatever code was running at the time of the signal, probably just getting caught and logged in the event loop as we've seen here.

If you have a lot of threads and are using a small sampling interval I think it's possible that the signal handler is taking more than its allotted time and a second signal is interrupting the first. Consider increasing the collector's interval and see if the problems still occur.

diogobaeder commented 11 years ago

So, I increased the interval to 0.1 seconds, and it did improve the stability, but I'm still getting those errors:

2013-04-09 07:05:32,393 INFO     twisted[16303]: Unexpected error in main loop.
2013-04-09 07:05:32,393 ERROR    twisted[16303]: Unhandled Error
Traceback (most recent call last):
  File "/home/diogobaeder/some-company/some-project/profiling-support/utilities/serve_wsgi_with_twisted.py", line 43, in <module>
    sys.exit(main())
  File "/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/some-project/wsgi/twisted.py", line 1704, in main
    return run_worker(options, config)
  File "/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/some-project/wsgi/twisted.py", line 1684, in run_worker
    reactor.run()
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1169, in run
    self.mainLoop()
--- <exception caught here> ---
  File "/usr/lib/python2.7/dist-packages/twisted/internet/base.py", line 1181, in mainLoop
    self.doIteration(t)
  File "/usr/lib/python2.7/dist-packages/twisted/internet/epollreactor.py", line 181, in doPoll
    l = self._poller.wait(len(self._selectables), timeout)
  File "/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/plop/collector.py", line 65, in handler
    self.stack_counts[tuple(frames)] += 1
exceptions.KeyError: (('/usr/lib/python2.7/genericpath.py', 26, 'isfile'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/werkzeug/serving.py', 142, 'iter_module_files'), ('/home/diogobaeder/some-company/ub
untuone/profiling-support/utilities/../lib/werkzeug/serving.py', 133, 'reloader_loop'), ('/home/diogobaeder/some-company/some-project/profiling-support/utilities/../lib/some-project/wsgi/twisted.py', 1072, 'reloader_thread_main'), ('/usr/li
b/python2.7/threading.py', 501, 'run'), ('/usr/lib/python2.7/threading.py', 533, '__bootstrap_inner'), ('/usr/lib/python2.7/threading.py', 510, '__bootstrap'))
bdarnell commented 11 years ago

How many threads do you typically have? Was the application still responsive while the profile was running?

diogobaeder commented 11 years ago

15 threads per server. In this second test, the application seemed to keep responsive after a small time using, but I didn't test it thoroughly to see if it went down like it happened in my first test.

bdarnell commented 11 years ago

Hmm, 15 threads shouldn't be enough to cause signal handler overrun problems, although if you're also having GIL contention it's a possibility. My next move would be to explicitly guard against overrun and see if the problems persist, with a patch like this:

diff --git a/plop/collector.py b/plop/collector.py
index 62f4a54..f168ef7 100644
--- a/plop/collector.py
+++ b/plop/collector.py
@@ -18,6 +18,7 @@ class Collector(object):
         self.interval = interval
         self.mode = mode
         assert mode in Collector.MODES
+        self.in_signal_handler = False
         timer, sig = Collector.MODES[self.mode]
         signal.signal(sig, self.handler)
         self.reset()
@@ -48,6 +49,9 @@ class Collector(object):
             pass # need busy wait; ITIMER_PROF doesn't proceed while sleeping

     def handler(self, sig, current_frame):
+        if self.in_signal_handler:
+            return
+        self.in_signal_handler = True
         start = time.time()
         self.samples_remaining -= 1
         if self.samples_remaining <= 0 or self.stopping:
@@ -67,6 +71,7 @@ class Collector(object):
         end = time.time()
         self.samples_taken += 1
         self.sample_time += (end - start)
+        self.in_signal_handler = False

     def filter(self, max_stacks):
         self.stack_counts = dict(sorted(self.stack_counts.iteritems(), key=lambda kv: -kv[1])[:max_stacks])