Intermittent end-of-process segfault on Linux + PySide 6

mdickinson commented 1 year ago

EDIT 2023-03-10: Upstream issue - https://bugreports.qt.io/browse/PYSIDE-2254

The script below sometimes gives me an end-of-process segfault on Ubuntu 22.04 with PySide 6.3.2.

See the docstring at the top of the code for steps to reproduce.

Results of a typical failing run on my machine look like this:

(crasher) mdickinson@ubuntu-2204:~/Desktop$ python -m unittest crasher.py
/home/mdickinson/Desktop/crasher/lib/python3.10/site-packages/pyface/util/guisupport.py:155: DeprecationWarning: 'exec_' will be removed in the future. Use 'exec' instead.
  app.exec_()
.
----------------------------------------------------------------------
Ran 1 test in 0.018s

OK
Segmentation fault

faulthandler gives no extra information, presumably because the segfault happens sufficiently late in the Python process teardown that faulthandler is no longer running.

"""
The script below segfaults for me around 10% of the time with:

- Ubuntu 22.04 (running under VirtualBox on a macOS Ventura / Intel host)
- Python 3.10 (installed from official Ubuntu package)
- PySide 6.3.2 (installed via pip)

To reproduce:

- Save this script under the name 'crasher.py'
- Create and activate a Python 3.10 venv with e.g.

    python -m venv --clear crasher
    source crasher/bin/activate

- Install pyface and PySide6 < 6.4 from PyPI:

    python -m pip install pyface "PySide6<6.4"

- Run this script under unittest:

    python -m unittest crasher.py

"""

import unittest

from pyface.gui import GUI
from pyface.tasks.api import TaskWindow
from traits.api import HasTraits, Instance

class MyTasksApplication(HasTraits):
    window = Instance(TaskWindow)

    def run(self):
        gui = GUI()
        window = TaskWindow()
        window.open()
        self.window = window
        gui.invoke_later(self.exit)
        gui.start_event_loop()

    def exit(self):
        window = self.window
        self.window = None
        window.destroy()
        window.closed = True

class TestTasksApplication(unittest.TestCase):
    def test_lifecycle(self):
        app = MyTasksApplication()
        app.run()

mdickinson commented 1 year ago

I've also tested against the #1203 branch, with similar results.

mdickinson commented 1 year ago

Prompted by a question from @corranwebster: after replacing TasksWindow with ApplicationWindow, I still see the segfault.

mdickinson commented 1 year ago

After several rounds of reductions, the crasher example looks like this. There's very little interesting machinery left.

The unittest wrapper shouldn't be necessary to reproduce, but on my machine it seems to increase the probability of a segfault, so it's helpful to keep it around while trying to find a minimal crasher.

Run under unittest (or coverage) with python -m unittest crasher.py

import unittest

from pyface.gui import GUI
from pyface.qt import QtGui

class MyWindow:

    def __init__(self):
        self.control = None

    def open(self):
        if self.control is None:
            control = QtGui.QMainWindow()
            control.setEnabled(True)
            control.setVisible(True)
            self.control = control

    def close(self):
        if self.control is not None:
            control = self.control
            self.control = None

            control.deleteLater()
            control.close()
            control.hide()

class MyApplication:

    def __init__(self):
        self.window = None

    def run(self):
        gui = GUI()
        window = MyWindow()
        window.open()
        self.window = window
        gui.invoke_later(self.exit)
        gui.start_event_loop()

    def exit(self):
        window = self.window
        self.window = None
        window.close()

class TestApplication(unittest.TestCase):
    def test_lifecycle(self):
        app = MyApplication()
        app.run()

        # Run the event loop
        gui = GUI()
        gui.invoke_after(100, gui.stop_event_loop)
        gui.start_event_loop()

corranwebster commented 1 year ago

Looking at _FutureCall and running on a mac, so I'm not seeing the crashes, but with basic instrumentation I get the following output:

run
open
2023-03-09 17:11:09.693 Python[59568:18426434] ApplePersistenceIgnoreState: Existing state will not be touched. New state will be written to /var/folders/qr/c9bg7cld60l309pyxy5_8xxw0000gn/T/org.python.python.savedState
adding <bound method MyApplication.exit of <pyface.crasher.MyApplication object at 0x10b580610>> 0 () {}
adding <built-in function setattr> 0 (<pyface.ui.qt4.gui.GUI object at 0x10ca9f400>, 'started', True) {}
exit
close
done
adding <bound method GUI.stop_event_loop of <pyface.ui.qt4.gui.GUI object at 0x10bff2a40>> 100 () {}
adding <built-in function setattr> 0 (<pyface.ui.qt4.gui.GUI object at 0x10bff2a40>, 'started', True) {}
removing 4 <bound method MyApplication.exit of <pyface.crasher.MyApplication object at 0x10b580610>>
removing 3 <built-in function setattr>
removing 2 <built-in function setattr>
removing 1 <bound method GUI.stop_event_loop of <pyface.ui.qt4.gui.GUI object at 0x10bff2a40>>
clean-up
.
----------------------------------------------------------------------
Ran 1 test in 0.244s

OK

Which means that the clean-up for the invoke_later and set_trait_later don't have a chance to run their clean-up timers before the event loop has stopped: https://github.com/enthought/pyface/blob/0fb8373b4dbb5e104d3120e88590dbf964865245/pyface/ui/qt4/gui.py#L174-L182

So I think that calling invoke later with a callable that shuts down the event loop is an anti-pattern: it's guaranteed to leave a hanging timer + global state in _FutureCall._calls. Basically you have to run the event loop again before Python exits or you risk all sorts of issues with object tear-down.

By comparison invoke_after is safer because it does its cleanup immediately after performing the call, but runs the risk of something else stopping the event loop and leaving its timer hanging.

This still doesn't answer the question about why this is failing on Linux.

mdickinson commented 1 year ago

And here's a truly minimal reproducer. It's accessing the thread method on the QApplication that's the problem, quite possibly because the Python main thread isn't an official QThread.

If I run the example below on Ubuntu and then quit the app manually (by clicking on the close button), I get a segfault. If I remove the app.thread() line, I can no longer reproduce the segfault.

I'll open issues and PRs tomorrow.

from PySide6.QtWidgets import QApplication, QWidget

def main():
    app = QApplication()
    window = QWidget()
    window.show()
    app.thread()
    app.exec()

if __name__ == "__main__":
    main()

mdickinson commented 1 year ago

I've tested a workaround that involves only doing the moveToThread when necessary (i.e., when threading.current_thread() != threading.main_thread(). That fixes the original Envisage segfaults for me.

xref: https://github.com/enthought/envisage/pull/509

mdickinson commented 1 year ago

The segfault persists with PySide 6.4.2, so doesn't appear to have been fixed upstream yet.

corranwebster commented 1 year ago

This appears to be fixed in PySide 6.4.3

enthought / pyface

Intermittent end-of-process segfault on Linux + PySide 6 #1211