NeuralEnsemble / python-neo

Neo is a package for representing electrophysiology data in Python, together with support for reading a wide range of neurophysiology file formats
http://neo.readthedocs.io/en/latest/
BSD 3-Clause "New" or "Revised" License
322 stars 248 forks source link

Circular references to proxy objects cause files to be locked even after leaving scope #684

Open jpgill86 opened 5 years ago

jpgill86 commented 5 years ago

With code like this,

import neo
io = neo.io.get_io(filename)
blk = io.read_block(lazy=True)
del io
del blk

the file filename will remain locked until the Python process ends.

I believe the cause is circular references to proxy objects, such as this:

https://github.com/NeuralEnsemble/python-neo/blob/70f78117bc2173a25e21149f3cdc69262df6e446/neo/io/basefromrawio.py#L264-L265

These circular references appear all over the place in Neo and are not unique to proxy objects. I suspect that because of them, most Neo objects in memory are not destroyed even after commands like del io; del blk. Not only is this a source of a memory leak, but in the case of lazy loading using proxy objects, the file becomes locked (e.g., it cannot be deleted in Windows at least).

Manually performing garbage collection is a workaround:

import gc
gc.collect()

However, the best solution I think would be to use weak references.

samuelgarcia commented 5 years ago

Hi jeffrey, thank you to catch this.Andrew have a long date plan to make a pseudo list object to deal with relationship in neo. It is catually a mess. At implementtation time, we could consider make weakref at least for one way bottum up.

The fact that file are still open when del to all object is strange. Of course when rawio many object have an internal memmap to data. So yes the file stay open when object is still alive. Note that for proxyobject there is a link to the IO : self._rawio I guess this could be a weakref. The ptach is easy to do.

jpgill86 commented 5 years ago

Andrew have a long date plan to make a pseudo list object to deal with relationship in neo. It is catually a mess. At implementtation time, we could consider make weakref at least for one way bottum up.

Yes, this sounds like a good way to do it eventually.

Of course when rawio many object have an internal memmap to data. So yes the file stay open when object is still alive.

Exactly. A strong reference to the RawIO persists in each proxy object, and the memmaps keep the file open. Each proxy object persists, even after del io; del blk, because of the circular references, until garbage collection runs.

Note that for proxyobject there is a link to the IO : self._rawio I guess this could be a weakref. The ptach is easy to do.

Good point, it may be easier to solve the file lock issue (not the memory leak problem) with a few targeted weakrefs.

However, the first few things I tried didn't work.

I tried replacing in each proxy object

self._rawio = rawio

with

self._rawio = weakref.proxy(rawio)

This had the effect of allowing the file to unlock with del io, and del blk is not needed because there are no strong references to the RawIO in blk. Unfortunately, this means that the following variant of reading the file will lose all its strong references, and the RawIO will be destroyed immediately:

blk = neo.io.get_io(filename).read_block(lazy=True)

Here I left out the intermediate step of creating io, which was the only strong reference to the RawIO.

This would probably break a lot of existing code, so that's not a good route to go.

I briefly thought maybe we needed weakrefs to the proxy objects themselves in BaseFromRaw (e.g., anasig = weakref.proxy(AnalogSignalProxy(rawio=self, ...))), but this doesn't work either because the proxy objects are destroyed instantly.

I next tried adding a single strong reference to the RawIO in the seg returned by read_segment (as a first try, I did if lazy: seg.rawio = self), thinking we could then use self._rawio = weakref.proxy(rawio) in each proxy object. However, this leads us back to the original problem: The segment is never destroyed even after del blk because of all the circular references.

The best solution I can think of right now, other than a widespread overhaul of relationships that uses weakrefs in all the right places, is to add a close_file method to blk objects returned by read_block when lazy=True, which could traverse its children and get rid of every reference to the RawIO.

Does that sound like a good idea or a bad one? Would it be better to live with the current behavior that locks the file and use manual garbage collection, until relationships can be overhauled? Or is the "quick fix" of adding close_file a reasonable short term alternative?

samuelgarcia commented 5 years ago

Thanks for this big investigation.

I did not figure out alll theses details.

The wiser way is working of relationship with weakref, so the Andrew's plan.

@apdavison @JuliaSprenger ?