PERF: Convert zmq dispatcher LOADING_LATENCY to msec

ksunden commented 3 years ago

Had been getting high CPU usage and ballooning memory usage ultimately causing the OS to kill my application. Changing this one number fixed that. It is documented as an integer in msec, so a value of 0.01 does not make sense

Additionally singleShot is a staticmethod and so does not need to be called on an instance of QTimer

Motivation and Context

PyQT5's docs are woefully incomplete here, but do indicate the type is int and the static method.

PySide2's docs tell the same story, but with at least more info on the other forms of the method signature, if not the one we actually use...

How Has This Been Tested?

Running an app using the ZMQ Dispatcher, watching memory usage rise (and CPU usage remain higher than expected at 100% of one core)

After changing memory remained under 1% and stable, CPU was around 10% (still perhaps a bit higher than I'd like, but acceptable)

ksunden commented 3 years ago

Even with this I seem to be memory leaking... not quite sure why but I suspect it has to do with the while True loop allowing a single "worker" to live past the time between successive calls... its certainly not as bad as when the latency was 0.01 (where I could watch as memory ticked up just watching htop), but if I leave my app open and idle it will eventually eat all available memory over the course of several hours...

Not fully confident this is the source of that leak, but given its sensitivity to that constant value, it is my biggest suspicion. I may try a bit more invasive of a fix (nothing too drastic, just an early return if its already running a worker...)

The thing that makes me a little less confident that this could be it is that I would expect workers to exit when no zmq messages are present, and it leaks memory when the RE is entirely idle (no documents being generated)

ksunden commented 3 years ago

Indeed, I seem to still have a significant memory leak caused by the zmq dispatcher/qt threading handling:

<frozen importlib._bootstrap_external>:647: size=37.5 MiB, count=334848, average=117 B
/home/kyle/src/bluesky/bluesky-widgets/bluesky_widgets/qt/threading.py:48: size=12.6 MiB, count=362061, average=37 B
/home/kyle/src/bluesky/bluesky-widgets/bluesky_widgets/qt/zmq_dispatcher.py:106: size=9295 KiB, count=103458, average=92 B
/home/kyle/src/bluesky/bluesky-widgets/bluesky_widgets/qt/threading.py:171: size=9294 KiB, count=103447, average=92 B
/home/kyle/src/bluesky/bluesky-widgets/bluesky_widgets/qt/threading.py:46: size=7273 KiB, count=103445, average=72 B
/home/kyle/src/bluesky/bluesky-widgets/bluesky_widgets/qt/threading.py:503: size=6869 KiB, count=51723, average=136 B
/home/kyle/src/bluesky/bluesky-widgets/bluesky_widgets/qt/threading.py:256: size=5658 KiB, count=51724, average=112 B
<frozen importlib._bootstrap>:228: size=4273 KiB, count=42323, average=103 B
/home/kyle/src/bluesky/bluesky-widgets/bluesky_widgets/qt/threading.py:482: size=2829 KiB, count=51724, average=56 B
/home/kyle/venvs/pyenv/versions/3.9.6/envs/bluesky/lib/python3.9/site-packages/yaqc_cmds/__main__.py:44: size=2028 KiB, count=51740, average=40 B
/home/kyle/src/bluesky/bluesky-widgets/bluesky_widgets/qt/zmq_dispatcher.py:108: size=2021 KiB, count=51724, average=40 B

Top memory allocations after letting my app cold start idle with no interaction, no events incoming to the queue... took about 30-40 minutes to accumulate 850 MB of memory usage. almost all of the top allocations are from threading.py and zmq_dispatcher.

ksunden commented 3 years ago

Okay... I'm not quite sure I have the best solution, but testing out what I think solves the problem I'm experiencing:

I think the issue is that each of the threads that is created has signals which are not destroyed with the thread and are not GC'd because of one of a) they live in C++ land and/or b) they are referenced by virtue of being connected to slots...

The "best" solution would probably be to figure out exactly what is failing to be destroyed and make sure it is destroyed. But its not clear to me exactly where that should happen/why it is not happening now.

The solution I am testing is to prevent additional workers from ever returning, and instead yielding such that the worker class can still cancel it (not blocking), but does not get rid of the worker really ever. (I left the code to create a new worker still, but it probably doesn't even need to be there...)

I did have to put rate limiting time.sleep calls in there to prevent constant 100% CPU usage, but it prevents having 100k "finished" thread objects lying around with no easy to find python reference.

danielballan commented 3 years ago

Thanks for this @ksunden.

Wanted to acknowledge it before I head out on vacation for a week. Will dig in when I return. Maybe @dmgav could take a look in the mean time.

ksunden commented 3 years ago

ping @danielballan @dmgav

ksunden commented 3 years ago

ping @danielballan

danielballan commented 3 years ago

Oof, sorry @ksunden. My GitHub notifications are a mess.

bluesky / bluesky-widgets

PERF: Convert zmq dispatcher LOADING_LATENCY to msec #150

Motivation and Context

How Has This Been Tested?