beetbox / audioread

cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python
MIT License
481 stars 108 forks source link

Gstreamer backend seems to leak memory #62

Open albertz opened 6 years ago

albertz commented 6 years ago

Here is a small script to demonstrate the issue. The memory consumption constantly grows (up to 8GB). See here for a discussion.

sampsyo commented 6 years ago

Interesting! To narrow down what's going wrong, can you please do some more investigation to narrow down the leak to specific actions in the audioread library? We might have a shot at fixing this if you can point to exactly what's being leaked.

albertz commented 6 years ago

See the script. Actually, the only thing I use is audio_open, looped over a lot of FLAC files. I call it only indirectly via librosa.load(filename, sr=None), which is a very straight-forward usage of audio_open.

sampsyo commented 6 years ago

I understand, but that still doesn't point to exactly where the leak is coming from. It would be awesome to have your help investigating exactly what gets leaked and when.

albertz commented 6 years ago

Yes, would be nice, but not sure if I have the time now (I already spent multiple hours in debugging this issue, and need to proceed with my actual work). I think you should be able to reproduce the issue with my script. As there as so many issues with Gstreamer anyway, I would maybe even suggest to completely remove it. My solution for now is to use PySoundFile instead of audioread. Btw., that is also what librosa is recommending.

sampsyo commented 6 years ago

OK! Please check back in if you ever get the chance to help.

ssssam commented 5 years ago

I'm experiencing this issue with Beets 1.4.6 on Fedora 28.

I tried updating to Git master of audioread, as the unreleased version 2.1.7 contains an FD leak fix (https://github.com/beetbox/audioread/commit/72ed349c12a16ab741cb02abc4de8f2e8e7fe4ee). This change causes beet import to either segfault or to log the following traceback:

Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.6/site-packages/audioread/ffdec.py", line 69, in run
    data = self.fh.read(self.blocksize)
ValueError: I/O operation on closed file

Exception in thread Thread-7:
Traceback (most recent call last):
  File "/usr/lib64/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.6/site-packages/audioread/ffdec.py", line 69, in run
    data = self.fh.read(self.blocksize)
ValueError: PyMemoryView_FromBuffer(): info->buf must not be NULL

I haven't been able to reproduce this issue using audioread/decode.py.

sampsyo commented 5 years ago

That's troubling. @RyanMarcus, have you encountered this?

Perhaps, to reproduce the problem, one would need to decode several files in a row?

ssssam commented 5 years ago

I've managed to reproduce it now. The crash appears to be triggered if the .close() method is called before reading is complete. I'll open a separate MR with a fix (edit: https://github.com/beetbox/audioread/pull/78)

RyanMarcus commented 5 years ago

Huh, that's strange -- it looks like a race. When the process is started, it seems like the reading process is delegated to a thread (i.e. QueueReaderThread). When close is called (possibly via __del__), my change closes the FDs, but potentially leaves the reader thread running.

I haven't tested this, but it would explain why a partial read is causing the issue.