Open noskb opened 1 year ago
@marmarta @marmarek, let me try to look into it next
Just for the record, I'm trying to trace what happens with the following change in domains.py
(code taken from the docs):
diff --git a/qui/tray/domains.py b/qui/tray/domains.py
index 1eae131..ddb583b 100644
--- a/qui/tray/domains.py
+++ b/qui/tray/domains.py
@@ -39,6 +39,34 @@ STATE_DICTIONARY = {
'domain-shutdown-failed': 'Running'
}
+import linecache
+import tracemalloc
+
+def display_top(snapshot, key_type='lineno', limit=10):
+ snapshot = snapshot.filter_traces((
+ tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
+ tracemalloc.Filter(False, "<unknown>"),
+ tracemalloc.Filter(False, linecache.__file__),
+ tracemalloc.Filter(False, tracemalloc.__file__),
+ ))
+ top_stats = snapshot.statistics(key_type)
+
+ print("Top %s lines" % limit)
+ for index, stat in enumerate(top_stats[:limit], 1):
+ frame = stat.traceback[0]
+ print("#%s: %s:%s: %.1f KiB"
+ % (index, frame.filename, frame.lineno, stat.size / 1024))
+ line = linecache.getline(frame.filename, frame.lineno).strip()
+ if line:
+ print(' %s' % line)
+
+ other = top_stats[limit:]
+ if other:
+ size = sum(stat.size for stat in other)
+ print("%s other: %.1f KiB" % (len(other), size / 1024))
+ total = sum(stat.size for stat in top_stats)
+ print("Total allocated size: %.1f KiB" % (total / 1024))
+
class IconCache:
def __init__(self):
@@ -515,6 +543,7 @@ class DomainMenuItem(Gtk.ImageMenuItem):
self.name.label.set_label(self.vm.name)
self._set_submenu(state)
+ display_top(tracemalloc.take_snapshot(), limit=30)
def update_stats(self, memory_kb, cpu_usage):
self.memory.update_state(int(memory_kb))
@@ -915,6 +944,7 @@ class DomainTray(Gtk.Application):
def main():
''' main function '''
+ tracemalloc.start(30)
qapp = qubesadmin.Qubes()
dispatcher = qubesadmin.events.EventsDispatcher(qapp)
stats_dispatcher = qubesadmin.events.EventsDispatcher(
which prints something like this on every call to update_state()
:
Top 30 lines
#1: /usr/lib/python3.8/site-packages/qubesadmin/storage.py:91: 67.2 KiB
self._info = dict([line.split('=', 1) for line in info.splitlines()])
#2: /usr/lib/python3.8/site-packages/qubesadmin/base.py:337: 23.2 KiB
name = name.decode()
#3: /usr/lib64/python3.8/site-packages/gi/module.py:215: 15.2 KiB
wrapper = metaclass(name, bases, dict_)
#4: /usr/lib64/python3.8/site-packages/gi/types.py:55: 12.6 KiB
setattr(cls, method_info.__name__, method_info)
#5: /usr/lib64/python3.8/site-packages/gi/types.py:54: 10.6 KiB
for method_info in cls.__info__.get_methods():
#6: /usr/lib64/python3.8/site-packages/gi/module.py:147: 10.3 KiB
wrapper = enum_add(g_type)
#7: /usr/lib64/python3.8/sre_compile.py:780: 8.9 KiB
return _sre.compile(
#8: /usr/lib/python3.8/site-packages/qubesadmin/base.py:339: 8.9 KiB
self._properties_cache[name] = (is_default, value)
#9: /usr/lib64/python3.8/asyncio/events.py:81: 7.9 KiB
self._context.run(self._callback, *self._args)
#10: /usr/lib64/python3.8/site-packages/gi/module.py:163: 7.0 KiB
setattr(wrapper, value_name, wrapper(value_info.get_value()))
#11: /usr/lib64/python3.8/site-packages/gi/module.py:141: 5.6 KiB
wrapper = flags_add(g_type)
#12: /usr/lib/python3.8/site-packages/qubesadmin/storage.py:39: 5.0 KiB
self.app = app
#13: /usr/lib64/python3.8/site-packages/gi/module.py:231: 4.5 KiB
self.__dict__[name] = wrapper
#14: /usr/lib/python3.8/site-packages/qubesadmin/base.py:276: 4.0 KiB
value = value.decode()
#15: /usr/lib/python3.8/site-packages/qubesadmin/base.py:352: 4.0 KiB
props.add(key)
#16: /usr/lib/python3.8/site-packages/qubesadmin/events/__init__.py:73: 3.9 KiB
self.handlers.setdefault(event, set()).add(handler)
#17: /usr/lib/python3.8/site-packages/qubesadmin/base.py:340: 3.7 KiB
self._properties = list(self._properties_cache.keys())
#18: /usr/lib/python3.8/site-packages/qubesadmin/base.py:359: 3.6 KiB
return super().__setattr__(key, value)
#19: /usr/lib64/python3.8/site-packages/gi/overrides/GLib.py:497: 3.3 KiB
super(MainLoop, self).run()
#20: /usr/lib64/python3.8/asyncio/base_events.py:431: 3.2 KiB
task = tasks.Task(coro, loop=self, name=name)
#21: /usr/lib/python3.8/site-packages/qubesadmin/app.py:77: 3.1 KiB
[vm_prop.split('=', 1) for vm_prop in props])
#22: /usr/lib64/python3.8/site-packages/gi/types.py:71: 3.1 KiB
setattr(cls, name, property(field_info.get_value, field_info.set_value))
#23: /usr/lib/python3.8/site-packages/qubesadmin/events/__init__.py:259: 2.8 KiB
handler(subject, event, **kwargs)
#24: /usr/lib64/python3.8/abc.py:102: 2.7 KiB
return _abc_subclasscheck(cls, subclass)
#25: /usr/lib64/python3.8/fnmatch.py:70: 2.4 KiB
match = _compile_pattern(pat)
#26: /usr/lib/python3.8/site-packages/qui/tray/domains.py:199: 2.3 KiB
asyncio.ensure_future(self.perform_restart())
#27: /usr/lib64/python3.8/site-packages/gi/types.py:156: 2.2 KiB
setattr(cls, name, vfunc_info)
#28: /usr/lib/python3.8/site-packages/qubesadmin/vm/__init__.py:264: 2.1 KiB
self._volumes[volname] = qubesadmin.storage.Volume(self.app,
#29: /usr/lib64/python3.8/contextlib.py:83: 2.1 KiB
self.gen = func(*args, **kwds)
#30: /usr/lib/python3.8/site-packages/qubesadmin/vm/__init__.py:261: 1.9 KiB
for volname in volumes_list.decode('ascii').splitlines():
582 other: 255.1 KiB
Total allocated size: 492.4 KiB
I'll keep it printing the stats to stdout for some time and let's see if there will be anything fishy noticeable...
After >20h of observation, it seems to only fluctuate around the above values on my machine. The task manager also reports constantly the same RSS value here (69.0 MiB).
@noskb, would you be able to try the above patch to domains.py
and observe if the reported memory allocations change in time on your machine? (the stats are printed to stdout, so this needs to be run from dom0 terminal)
Hello @slayoo, with your patch applied, qui-domains is not working properly (loop events blocked?). I assume this is why the RSS value is constant.
From what I've learned in the last year as far as I've dug a little deeper, the memory leak doesn't occur when stats_dispatcher
is disabled, and is also related to gbulb.
The following script can reproduce the memory leak, but it will not occur if gbulb.install()
is commented out:
#!/usr/bin/env python3
import asyncio
import qubesadmin
import qubesadmin.events
import gbulb
gbulb.install()
def dummy(*args, ** kwargs):
pass
app = qubesadmin.Qubes()
stats = qubesadmin.events.EventsDispatcher(app, api_method="admin.vm.Stats")
stats.add_handler("vm-stats", dummy)
loop = asyncio.get_event_loop()
task = [asyncio.ensure_future(stats.listen_for_events())]
loop.run_until_complete(asyncio.wait(task))
I am stuck here as I am not an expert.
Thank you @noskb, that's very helpful!
I experience this too (perhaps a signal I'm not shutting down frequently enough). Until the leak is fixed, as a mitigation you can restart the qui-domains
widget:
[user@dom0 ~]$ systemctl --user restart qubes-widget@qui-domains.service
That seems to be well tolerated by the system and is much less disruptive than restarting all of dom0.
How to file a helpful issue
Qubes OS release
R4.2
Brief summary
qui-domains slowly causes memory leaks as it continues to run. The screenshot was taken after about 14 days uptime and shows that qui-domains process consumed more than 200 MiB.
Steps to reproduce
Keep machine running.
Expected behavior
At least the memory consumption of qui-domains does not exceed 100 MiB.
Actual behavior
Memory leak occurs.