DC-analysis / dclab

Python library for the post-measurement analysis of real-time deformability cytometry (RT-DC) data sets
https://dclab.readthedocs.io
Other
10 stars 12 forks source link

Circular reference of 'RTDCDataset' and 'Filter' leads to memory leak #214

Closed maxschloegel closed 1 year ago

maxschloegel commented 1 year ago

Due to the circular referencing between the two classes, the Filter-objects of RTDCBase-objects are not deleted properly when using the context manager for the dataset class. This in turn leads to memory not being freed and slowly filling the memory when using the context manager over and over again.

Here is a minimal working example (without actually using dclab) that will display memory leak behaviour. In the example, MainClass corresponds to the RTDCBase-class and HelperClass corresponds to the Filter-class.

import gc
import numpy as np
import tqdm

import weakref

class MainClass:
    def __init__(self):
        # self.helper = weakref.ref(TestHelper(self))  # (solution option 3)
        self.helper = HelperClass(self)

    def __enter__(self):
        return self

    def __exit__(self, exc_type, exc_value, trace):
        pass
        #gc.collect()                                  (solution option 1)
        #del self.helper                               (solution option 2)

class HelperClass:
    def __init__(self, main_class):
        self.main_class = main_class
        self.inner_arr = np.random.randint(1,size=(3000000) ,dtype='bool')

if __name__ == "__main__":
    for idx in tqdm.tqdm(range(1000000)):
        with MainClass() as tc:
            pass

There are three solutions for this:

  1. Explicitly call the garbage-collector each time (or after a fix set of times) after a Context Managers ends (e.g. in the __exit__()-function). This unfortunately is not very performant and leads to very slow enter-exit cycles.
  2. Explicitly delete the Filter-obj in the __exit__()-function. This works fine, but only fights the symptom, not the cause.
  3. Instead of normally instantiating the RTDCBase-object members, use pythons library weakref to have a weak reference to all class-members that themselves reference back to the RTDCBase-object.

Option 3 (weakref) seems to be the preferred one, as it does not clutter the __exit__()-function.

ToDos:

maxschloegel commented 1 year ago

OK, so I implemented weakref for all circular references I could find: fmt_HDF5 <-> Export and for child datasets fmt_hierarchy.RTDC_Hierarchy <-> fmt_hierarchy.ChildBase

But it did not resolve it. By using gc.get_referrers, it turns out, that the RTDC_HDF5-object is somehow referencing itself and I really don't understand where this reference is created.

maxschloegel commented 1 year ago

I tried to figure this out by going through the instantiation of the RTDC_HDF5 object step by step and see where it starts referencing itself and apparently it happens during RTDCBases function _init_filters() right after instantiating the the filter in https://github.com/DC-analysis/dclab/blob/d0f28a94e540852a82fb8bd93593f96969c68d9c/dclab/rtdc_dataset/core.py#L250 so after going into the instantiation-method of the Filter-class. I could still not figure out, why this is happening. Will continue to dig into it.

maxschloegel commented 1 year ago

After a digging a bit further, I found the following:

When having two objects, like this:

class A:
    def __init__(self):
        self.val = 5
        self.c = C(self)

class C:
    def __init__(self, a):
        self.val = a.val

a = A()

then during the instantiation of the attribute self.c = C(self) the object a references itself (found via gc.get_referrers(a)). But this reference is removed after object c was completely instantiated.

While this is quite unexpected, the weird thing is that, an RTDC_HDF5 object also references itself in the instantiation of itsfilter attribute and then the self reference does not get released after the filter object was completely created.

maxschloegel commented 1 year ago

It seems that the problem of self-referencing RTDC_HDF5 objects lies in the usage of the functools.lru_cache-decorator. It is used a couple of times in the RTDC_HDF5-class. A solution to this problem was posted here on SO: https://stackoverflow.com/questions/33672412/python-functools-lru-cache-with-instance-methods-release-object

but I will look for built-in functionalities in Python to deal with this problem first, as this would be preferred.

If no better solution that the SO-suggestion can be found, the linked weak_lru-decorator can be implemented in the dclab/utils.py-file (next to the file_monitoring_lru_cache-class).

maxschloegel commented 1 year ago

Ok, so having a weakref-version of lru_cache was suggested to the Python developers, but due to a few reasons they did not implement it (or even mention the memory leak problem in the documentation), see here.

At the end of this issue, the very same solution was proposed as was done in the SO-thread (it was actually the same guy :) ) As I did not find another similarly nice solution, I will go on and implement the weakref-lru cache and see if it works.

paulmueller commented 1 year ago

fixed by #240