QubesOS / qubes-issues

The Qubes OS Project issue tracker
https://www.qubes-os.org/doc/issue-tracking/
534 stars 46 forks source link

qui-domains memory leaks #8442

Open noskb opened 1 year ago

noskb commented 1 year ago

How to file a helpful issue

Qubes OS release

R4.2

Brief summary

qui-domains slowly causes memory leaks as it continues to run. The screenshot was taken after about 14 days uptime and shows that qui-domains process consumed more than 200 MiB.

quidomains_memoryleak

Steps to reproduce

Keep machine running.

Expected behavior

At least the memory consumption of qui-domains does not exceed 100 MiB.

Actual behavior

Memory leak occurs.

slayoo commented 6 months ago

@marmarta @marmarek, let me try to look into it next

slayoo commented 5 months ago

Just for the record, I'm trying to trace what happens with the following change in domains.py (code taken from the docs):

diff --git a/qui/tray/domains.py b/qui/tray/domains.py
index 1eae131..ddb583b 100644
--- a/qui/tray/domains.py
+++ b/qui/tray/domains.py
@@ -39,6 +39,34 @@ STATE_DICTIONARY = {
     'domain-shutdown-failed': 'Running'
 }

+import linecache
+import tracemalloc
+
+def display_top(snapshot, key_type='lineno', limit=10):
+    snapshot = snapshot.filter_traces((
+        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
+        tracemalloc.Filter(False, "<unknown>"),
+        tracemalloc.Filter(False, linecache.__file__),
+        tracemalloc.Filter(False, tracemalloc.__file__),
+    ))
+    top_stats = snapshot.statistics(key_type)
+
+    print("Top %s lines" % limit)
+    for index, stat in enumerate(top_stats[:limit], 1):
+        frame = stat.traceback[0]
+        print("#%s: %s:%s: %.1f KiB"
+              % (index, frame.filename, frame.lineno, stat.size / 1024))
+        line = linecache.getline(frame.filename, frame.lineno).strip()
+        if line:
+            print('    %s' % line)
+
+    other = top_stats[limit:]
+    if other:
+        size = sum(stat.size for stat in other)
+        print("%s other: %.1f KiB" % (len(other), size / 1024))
+    total = sum(stat.size for stat in top_stats)
+    print("Total allocated size: %.1f KiB" % (total / 1024))
+

 class IconCache:
     def __init__(self):
@@ -515,6 +543,7 @@ class DomainMenuItem(Gtk.ImageMenuItem):
             self.name.label.set_label(self.vm.name)

         self._set_submenu(state)
+        display_top(tracemalloc.take_snapshot(), limit=30)

     def update_stats(self, memory_kb, cpu_usage):
         self.memory.update_state(int(memory_kb))
@@ -915,6 +944,7 @@ class DomainTray(Gtk.Application):

 def main():
     ''' main function '''
+    tracemalloc.start(30)
     qapp = qubesadmin.Qubes()
     dispatcher = qubesadmin.events.EventsDispatcher(qapp)
     stats_dispatcher = qubesadmin.events.EventsDispatcher(

which prints something like this on every call to update_state():

Top 30 lines
#1: /usr/lib/python3.8/site-packages/qubesadmin/storage.py:91: 67.2 KiB
    self._info = dict([line.split('=', 1) for line in info.splitlines()])
#2: /usr/lib/python3.8/site-packages/qubesadmin/base.py:337: 23.2 KiB
    name = name.decode()
#3: /usr/lib64/python3.8/site-packages/gi/module.py:215: 15.2 KiB
    wrapper = metaclass(name, bases, dict_)
#4: /usr/lib64/python3.8/site-packages/gi/types.py:55: 12.6 KiB
    setattr(cls, method_info.__name__, method_info)
#5: /usr/lib64/python3.8/site-packages/gi/types.py:54: 10.6 KiB
    for method_info in cls.__info__.get_methods():
#6: /usr/lib64/python3.8/site-packages/gi/module.py:147: 10.3 KiB
    wrapper = enum_add(g_type)
#7: /usr/lib64/python3.8/sre_compile.py:780: 8.9 KiB
    return _sre.compile(
#8: /usr/lib/python3.8/site-packages/qubesadmin/base.py:339: 8.9 KiB
    self._properties_cache[name] = (is_default, value)
#9: /usr/lib64/python3.8/asyncio/events.py:81: 7.9 KiB
    self._context.run(self._callback, *self._args)
#10: /usr/lib64/python3.8/site-packages/gi/module.py:163: 7.0 KiB
    setattr(wrapper, value_name, wrapper(value_info.get_value()))
#11: /usr/lib64/python3.8/site-packages/gi/module.py:141: 5.6 KiB
    wrapper = flags_add(g_type)
#12: /usr/lib/python3.8/site-packages/qubesadmin/storage.py:39: 5.0 KiB
    self.app = app
#13: /usr/lib64/python3.8/site-packages/gi/module.py:231: 4.5 KiB
    self.__dict__[name] = wrapper
#14: /usr/lib/python3.8/site-packages/qubesadmin/base.py:276: 4.0 KiB
    value = value.decode()
#15: /usr/lib/python3.8/site-packages/qubesadmin/base.py:352: 4.0 KiB
    props.add(key)
#16: /usr/lib/python3.8/site-packages/qubesadmin/events/__init__.py:73: 3.9 KiB
    self.handlers.setdefault(event, set()).add(handler)
#17: /usr/lib/python3.8/site-packages/qubesadmin/base.py:340: 3.7 KiB
    self._properties = list(self._properties_cache.keys())
#18: /usr/lib/python3.8/site-packages/qubesadmin/base.py:359: 3.6 KiB
    return super().__setattr__(key, value)
#19: /usr/lib64/python3.8/site-packages/gi/overrides/GLib.py:497: 3.3 KiB
    super(MainLoop, self).run()
#20: /usr/lib64/python3.8/asyncio/base_events.py:431: 3.2 KiB
    task = tasks.Task(coro, loop=self, name=name)
#21: /usr/lib/python3.8/site-packages/qubesadmin/app.py:77: 3.1 KiB
    [vm_prop.split('=', 1) for vm_prop in props])
#22: /usr/lib64/python3.8/site-packages/gi/types.py:71: 3.1 KiB
    setattr(cls, name, property(field_info.get_value, field_info.set_value))
#23: /usr/lib/python3.8/site-packages/qubesadmin/events/__init__.py:259: 2.8 KiB
    handler(subject, event, **kwargs)
#24: /usr/lib64/python3.8/abc.py:102: 2.7 KiB
    return _abc_subclasscheck(cls, subclass)
#25: /usr/lib64/python3.8/fnmatch.py:70: 2.4 KiB
    match = _compile_pattern(pat)
#26: /usr/lib/python3.8/site-packages/qui/tray/domains.py:199: 2.3 KiB
    asyncio.ensure_future(self.perform_restart())
#27: /usr/lib64/python3.8/site-packages/gi/types.py:156: 2.2 KiB
    setattr(cls, name, vfunc_info)
#28: /usr/lib/python3.8/site-packages/qubesadmin/vm/__init__.py:264: 2.1 KiB
    self._volumes[volname] = qubesadmin.storage.Volume(self.app,
#29: /usr/lib64/python3.8/contextlib.py:83: 2.1 KiB
    self.gen = func(*args, **kwds)
#30: /usr/lib/python3.8/site-packages/qubesadmin/vm/__init__.py:261: 1.9 KiB
    for volname in volumes_list.decode('ascii').splitlines():
582 other: 255.1 KiB
Total allocated size: 492.4 KiB

I'll keep it printing the stats to stdout for some time and let's see if there will be anything fishy noticeable...

slayoo commented 5 months ago

After >20h of observation, it seems to only fluctuate around the above values on my machine. The task manager also reports constantly the same RSS value here (69.0 MiB).

@noskb, would you be able to try the above patch to domains.py and observe if the reported memory allocations change in time on your machine? (the stats are printed to stdout, so this needs to be run from dom0 terminal)

noskb commented 5 months ago

Hello @slayoo, with your patch applied, qui-domains is not working properly (loop events blocked?). I assume this is why the RSS value is constant.

From what I've learned in the last year as far as I've dug a little deeper, the memory leak doesn't occur when stats_dispatcher is disabled, and is also related to gbulb.

The following script can reproduce the memory leak, but it will not occur if gbulb.install() is commented out:

#!/usr/bin/env python3
import asyncio

import qubesadmin
import qubesadmin.events

import gbulb
gbulb.install()

def dummy(*args, ** kwargs):
    pass

app = qubesadmin.Qubes()
stats = qubesadmin.events.EventsDispatcher(app, api_method="admin.vm.Stats")
stats.add_handler("vm-stats", dummy)
loop = asyncio.get_event_loop()
task = [asyncio.ensure_future(stats.listen_for_events())]
loop.run_until_complete(asyncio.wait(task))

I am stuck here as I am not an expert.

slayoo commented 5 months ago

Thank you @noskb, that's very helpful!