enthought / pyface

pyface: traits-capable windowing framework
Other
106 stars 55 forks source link

Garbage-collection-related segfault from wx CallbackTimer #815

Open mdickinson opened 3 years ago

mdickinson commented 3 years ago

I'm getting a segmentation fault from a use of the wx flavour of CallbackTimer (on macOS, but it seems likely that it's not platform-specific). It appears to be related to garbage collection and circular references in the timer implementation.

Here's a script to reproduce. Not particularly minimal, I'm afraid, but it is self-contained:

Script to reproduce ```python import wx from pyface.timer.api import CallbackTimer #: Default timeout, in seconds TIMEOUT = 10.0 class AppForTesting(wx.App): def OnInit(self): """ Override base class to ensure we have at least one window. """ # It's necessary to have at least one window to prevent the application # exiting immediately. self.frame = wx.Frame(None) self.SetTopWindow(self.frame) self.frame.Show(False) return True def exit(self, exit_code): """ Exit the application main event loop with a given exit code. The event loop can be started and stopped several times for a single AppForTesting object. """ self.exit_code = exit_code self.ExitMainLoop() def close(self): """ Clean up when the object is no longer needed. """ self.frame.Close() del self.frame class GuiTestAssistant: """ Support for running the wx event loop in unit tests. """ def setUp(self): self.wx_app = AppForTesting() def tearDown(self): self.wx_app.close() del self.wx_app def run_until(self, object, trait, condition, timeout=TIMEOUT): """ Run event loop until the given condition holds true, or until timeout. The condition is re-evaluated, with the object as argument, every time the trait changes. Parameters ---------- object : traits.has_traits.HasTraits Object whose trait we monitor. trait : str Name of the trait to monitor for changes. condition : callable Single-argument callable, returning a boolean. This will be called with *object* as the only input. timeout : float, optional Number of seconds to allow before timing out with an exception. The (somewhat arbitrary) default is 10 seconds. Raises ------ RuntimeError If timeout is reached, regardless of whether the condition is true or not at that point. """ wx_app = self.wx_app timeout_timer = CallbackTimer( interval=timeout, repeat=1, callback=lambda: wx_app.exit(1), ) def stop_if_condition(): if condition(object): wx_app.exit(0) object.on_trait_change(stop_if_condition, trait) try: # The condition may have become True before we # started listening to changes. So start with a check. if condition(object): timed_out = 0 else: timeout_timer.start() try: wx_app.MainLoop() finally: timed_out = wx_app.exit_code timeout_timer.stop() finally: object.on_trait_change(stop_if_condition, trait, remove=True) if timed_out: raise RuntimeError( "run_until timed out after {} seconds. " "At timeout, condition was {}.".format( timeout, condition(object) ) ) from traits.api import HasStrictTraits, Str class Dummy(HasStrictTraits): dummy = Str def exercise_the_assistant_once(): assistant = GuiTestAssistant() assistant.setUp() try: dummy = Dummy() try: assistant.run_until(dummy, "dummy", lambda obj: False, timeout=0.01) except RuntimeError: pass finally: assistant.tearDown() def exercise_the_assistant(): import itertools for i in itertools.count(): print(i) exercise_the_assistant_once() if __name__ == "__main__": exercise_the_assistant() ```

Here are the results of running the script on my machine (macOS 10.15.7, Python 3.6.12 from EDM, Pyface 7.1.0-1 from EDM, wxPython 4.1.1 from PyPI):

Results of running the script on my machine ``` mdickinson@mirzakhani Desktop % python -Xfaulthandler gui_test_assistant.py 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 objc[20994]: Invalid or prematurely-freed autorelease pool 0x7f82b3242fe8. Fatal Python error: Aborted Current thread 0x0000000114155dc0 (most recent call first): File "/Users/mdickinson/.edm/envs/traits-futures-py36-wxpython/lib/python3.6/site-packages/traits/has_traits.py", line 3206 in _init_trait_listeners File "gui_test_assistant.py", line 85 in run_until File "gui_test_assistant.py", line 128 in exercise_the_assistant_once File "gui_test_assistant.py", line 139 in exercise_the_assistant File "gui_test_assistant.py", line 143 in zsh: abort python -Xfaulthandler gui_test_assistant.py ```

I'm guessing that the problem relates to the reference to a wx application by the timer callback. By the time the timer gets garbage collected, that reference is to a defunct wx application, and something inside wx is then doing something resembling a double free.

Turning cyclic garbage collection off (gc.disable()) defers the segfault until process exit.

Adding a del timeout_timer in the finally block immediately after stopping the timer doesn't fix the segfault, because there are circular references in the timer implementation that keep the timer alive even after all references to it have disappeared.

Adding a del timeout_timer followed by a gc.collect() call does fix the segfault.

While this bug isn't coming directly from Pyface, it looks as though it would be worth refactoring the Pyface wx timer implementations to avoid the circular references, so that the timer gets garbage collected as part of regular refcount-based gc.

mdickinson commented 3 years ago

This may not be entirely specific to Wx: I witnessed the segfault with Wx, but if the circular references are there in the Qt implementation too then those could potentially cause problems later on (due to Qt objects being collected on the wrong thread by the cyclic garbage collector).