enthought / pyface

pyface: traits-capable windowing framework
Other
106 stars 55 forks source link

Investigate more efficient methods for cross-thread communications #811

Open mdickinson opened 3 years ago

mdickinson commented 3 years ago

[Motivated by discussion on enthought/traits-futures#231]

For code running in a background thread, the standard supported way to have that code communicate with the main thread is to use dispatch="ui" in a Trait listener. However, the underlying machinery supporting this does a significant amount of work under the hood (creating new objects, moving them to the right thread, acquiring and releasing locks), which potentially acts as a limit on the rate of communication between the background thread and the main thread.

For most code this won't matter: for example, if there are a small handful of ui dispatches per user interaction, then the existing machinery is plenty fast enough. But in cases where there may be thousands of communications per second, the overhead can become a bottleneck, and there may be more efficient solutions available.

Here are a couple of crude example scripts that try to explore the limitations. The first script pops up a UI with a single button. When that button is pressed, a background thread is started which repeatedly pings the main UI thread using dispatch='ui'. There's no user-visible response to the pings, but we record the send and receive times of the pings. It'll print a message to the console when the background thread has finished sending pings, but you may still need to wait for the UI thread to catch up.

import threading
import time

import matplotlib.pyplot as plt
import numpy as np

from traits.api import (
    Button, Event, Float, HasTraits, Int, List, on_trait_change)
from traitsui.api import Item, View

class PingTarget(HasTraits):
    ping = Event()

    ping_received_timestamps = List(Float)

    ping_sent_timestamps = List(Float)

    start_pings = Button()

    ping_count = Int(65536)

    @on_trait_change("ping", dispatch="ui")
    def _log_timestamp(self):
        self.ping_received_timestamps.append(time.monotonic())

    @on_trait_change("start_pings")
    def _start_background_thread(self):
        t = threading.Thread(target=self._ping_on_background_thread)
        t.start()

    def _ping_on_background_thread(self):
        for _ in range(self.ping_count):
            self.ping_sent_timestamps.append(time.monotonic())
            self.ping = True
        print("Done pinging")

    view = View(Item('start_pings'))

def main():
    ping_target = PingTarget()
    ping_target.configure_traits()

    sent_timestamps = ping_target.ping_sent_timestamps
    received_timestamps = ping_target.ping_received_timestamps

    sent = np.array(sent_timestamps)
    received = np.array(received_timestamps)

    # Normalise with respect to the first ping sent.
    assert sent.min() <= received.min()
    base = sent.min()
    sent -= base
    received -= base

    # Print some statistics
    print("Number of pings sent: ", len(sent))
    print("Number of pings received: ", len(received))
    print("Median time between sent pings: ", np.median(np.diff(sent)))
    print("Median time between received pings: ", np.median(np.diff(received)))
    print("Median ping latency: ", np.median(received - sent))

    # Plot send and receive times against each other
    extent = max(sent.max(), received.max())
    plt.plot([0, extent], [0, extent])

    plt.plot(sent, received)
    plt.show()

if __name__ == "__main__":
    main()

Here's a second script that does the same as the first, but sets up a direct signal-slot connection for the background thread pinging.

import threading
import time

import matplotlib.pyplot as plt
import numpy as np

from pyface.qt.QtCore import QObject, Qt, Signal, Slot
from traits.api import (
    Button, Event, Float, HasTraits, Int, List, on_trait_change)
from traitsui.api import Item, View

class HasPingSignal(QObject):
    ping = Signal()

class HasPingSlot(QObject):
    def __init__(self, pingee):
        QObject.__init__(self)
        self._pingee = pingee

    @Slot()
    def _on_ping(self):
        self._pingee.ping = True

class PingTarget(HasTraits):
    def __init__(self, **traits):
        super().__init__(**traits)
        self._has_slot = HasPingSlot(self)

    ping = Event()

    ping_received_timestamps = List(Float)

    ping_sent_timestamps = List(Float)

    start_pings = Button()

    ping_count = Int(65536)

    @on_trait_change("ping")
    def _log_timestamp(self):
        self.ping_received_timestamps.append(time.monotonic())

    @on_trait_change("start_pings")
    def _start_background_thread(self):
        t = threading.Thread(target=self._ping_on_background_thread)
        t.start()

    def _ping_on_background_thread(self):
        pinger = HasPingSignal()
        pinger.ping.connect(self._has_slot._on_ping, Qt.QueuedConnection)

        for _ in range(self.ping_count):
            self.ping_sent_timestamps.append(time.monotonic())
            pinger.ping.emit()
        print("Done pinging")

    view = View(Item('start_pings'))

def main():
    ping_target = PingTarget()
    ping_target.configure_traits()

    sent_timestamps = ping_target.ping_sent_timestamps
    received_timestamps = ping_target.ping_received_timestamps

    sent = np.array(sent_timestamps)
    received = np.array(received_timestamps)

    # Normalise with respect to the first ping sent.
    assert sent.min() <= received.min()
    base = sent.min()
    sent -= base
    received -= base

    # Print some statistics
    print("Number of pings sent: ", len(sent))
    print("Number of pings received: ", len(received))
    print("Median time between sent pings: ", np.median(np.diff(sent)))
    print("Median time between received pings: ", np.median(np.diff(received)))
    print("Median ping latency: ", np.median(received - sent))

    # Plot send and receive times against each other
    extent = max(sent.max(), received.max())
    plt.plot([0, extent], [0, extent])

    plt.plot(sent, received)
    plt.show()

if __name__ == "__main__":
    main()

Here are results of running the two scripts on my machine (macOS 10.15.7, Python 3.9.0 venv, TraitsUI 7.1.0, Traits 6.1.1, PySide2 5.15.1):

First script result:

(testing) mdickinson@mirzakhani Desktop % python time_ui_dispatch.py 
Done pinging
Number of pings sent:  65536
Number of pings received:  65536
Median time between sent pings:  2.1944999999412573e-05
Median time between received pings:  2.373900000041118e-05
Median ping latency:  1.7192890949999997
Screenshot 2020-11-18 at 12 40 23

Second script result:

(testing) mdickinson@mirzakhani Desktop % python time_ui_dispatch2.py
Done pinging
Number of pings sent:  65536
Number of pings received:  65536
Median time between sent pings:  2.914100000062092e-05
Median time between received pings:  2.9127000000350733e-05
Median ping latency:  4.665999999886594e-05
Screenshot 2020-11-18 at 12 41 16

For PySide2, there's a huge lag in the first case before pings start to be processed; I'm not sure what's causing this. For the second script, the ping sending is a little slower, but the GUI thread can keep up with the pings being sent.

mdickinson commented 3 years ago

Interestingly, the performance characteristics of PyQt5 are very different from those of PySide2. With PySide 2 there seems to be a big up-front lag, while with PyQt5 we start processing the pings immediately, but the GUI thread can't keep up with the generated pings and so lags further and further behind. Here are the script results:

(testing) mdickinson@mirzakhani Desktop % python time_ui_dispatch.py
Done pinging
Number of pings sent:  65536
Number of pings received:  65536
Median time between sent pings:  5.3518000000085664e-05
Median time between received pings:  5.998699999931745e-05
Median ping latency:  0.9168451444999997
(testing) mdickinson@mirzakhani Desktop % python time_ui_dispatch2.py
Done pinging
Number of pings sent:  65536
Number of pings received:  65536
Median time between sent pings:  2.4248999999976206e-05
Median time between received pings:  2.4301000000281903e-05
Median ping latency:  3.8796000000118624e-05

and the graphs:

Screenshot 2020-11-18 at 12 48 33 Screenshot 2020-11-18 at 12 49 10
mdickinson commented 3 years ago

For PySide2, there's a huge lag in the first case before pings start to be processed; I'm not sure what's causing this.

Looking more closely, the pings don't start to be processed until the background thread has completed. I'm not sure why or how the background thread would block the processing in the UI thread in this way.

mdickinson commented 3 years ago

Suggestion from @corranwebster: we should see if using a QThread instead of a regular Python thread for the background thread makes any difference.

corranwebster commented 3 years ago

I think there is a subtle difference between these two implementations, and which might give a hint as to why behaviour is different: the current dispatch='ui' only dispatches via the event loop if we aren't on the main thread, otherwise it dispatches by function call; I think the signal/slot approach always uses the event loop, even when the signal fires on the main thread.

corranwebster commented 3 years ago

And a thought on the differences: perhaps PyQt5/SIP and PySide2/Shiboken have different approaches to handling the GIL?

mdickinson commented 3 years ago

I think the signal/slot approach always uses the event loop, even when the signal fires on the main thread.

Yep, with the explicit Qt.QueuedConnection connection type in the connect call, I think that's right. If that's left out, Qt chooses whether to use a DirectConnection or a QueuedConnection based on object thread affinities. It would probably be worth re-running the timing experiments without the explicit type; specifying the type was an optimization (saving Qt from having to check thread affinities), and probably a premature one.

Documentation link: https://doc.qt.io/qt-5/threads-qobject.html#signals-and-slots-across-threads

mdickinson commented 3 years ago

perhaps PyQt5/SIP and PySide2/Shiboken have different approaches to handling the GIL?

Perhaps, but I'm not sure why that would matter. We're executing multiple bytecodes in the background thread, so the GIL should still be being released regularly on the background thread in the normal way - this isn't like the situation where you have a single bytecode executing a long, blocking function.