Closed maxschloegel closed 1 year ago
OK, so I implemented weakref for all circular references I could find:
fmt_HDF5
<-> Export
and for child datasets
fmt_hierarchy.RTDC_Hierarchy
<-> fmt_hierarchy.ChildBase
But it did not resolve it.
By using gc.get_referrers
, it turns out, that the RTDC_HDF5
-object is somehow referencing itself and I really don't understand where this reference is created.
I tried to figure this out by going through the instantiation of the RTDC_HDF5 object step by step and see where it starts referencing itself and apparently it happens during RTDCBase
s function _init_filters()
right after instantiating the the filter in
https://github.com/DC-analysis/dclab/blob/d0f28a94e540852a82fb8bd93593f96969c68d9c/dclab/rtdc_dataset/core.py#L250
so after going into the instantiation-method of the Filter-class.
I could still not figure out, why this is happening.
Will continue to dig into it.
After a digging a bit further, I found the following:
When having two objects, like this:
class A:
def __init__(self):
self.val = 5
self.c = C(self)
class C:
def __init__(self, a):
self.val = a.val
a = A()
then during the instantiation of the attribute self.c = C(self)
the object a
references itself (found via gc.get_referrers(a)
). But this reference is removed after object c
was completely instantiated.
While this is quite unexpected, the weird thing is that, an RTDC_HDF5
object also references itself in the instantiation of itsfilter
attribute and then the self reference does not get released after the filter object was completely created.
It seems that the problem of self-referencing RTDC_HDF5
objects lies in the usage of the functools.lru_cache
-decorator. It is used a couple of times in the RTDC_HDF5
-class.
A solution to this problem was posted here on SO:
https://stackoverflow.com/questions/33672412/python-functools-lru-cache-with-instance-methods-release-object
but I will look for built-in functionalities in Python to deal with this problem first, as this would be preferred.
If no better solution that the SO-suggestion can be found, the linked weak_lru
-decorator can be implemented in the dclab/utils.py
-file (next to the file_monitoring_lru_cache
-class).
Ok, so having a weakref
-version of lru_cache
was suggested to the Python developers, but due to a few reasons they did not implement it (or even mention the memory leak problem in the documentation), see here.
At the end of this issue, the very same solution was proposed as was done in the SO-thread (it was actually the same guy :) )
As I did not find another similarly nice solution, I will go on and implement the weakref
-lru cache and see if it works.
fixed by #240
Due to the circular referencing between the two classes, the Filter-objects of RTDCBase-objects are not deleted properly when using the context manager for the dataset class. This in turn leads to memory not being freed and slowly filling the memory when using the context manager over and over again.
Here is a minimal working example (without actually using dclab) that will display memory leak behaviour. In the example,
MainClass
corresponds to theRTDCBase
-class andHelperClass
corresponds to theFilter
-class.There are three solutions for this:
__exit__()
-function). This unfortunately is not very performant and leads to very slowenter
-exit
cycles.Filter
-obj in the__exit__()
-function. This works fine, but only fights the symptom, not the cause.RTDCBase
-object members, use pythons libraryweakref
to have a weak reference to all class-members that themselves reference back to theRTDCBase
-object.Option 3 (
weakref
) seems to be the preferred one, as it does not clutter the__exit__()
-function.ToDos:
RTDCBase
-members that reference back to theRTDCBase
-objweakref
-reference.