beetbox / audioread

cross-library (GStreamer + Core Audio + MAD + FFmpeg) audio decoding for Python
MIT License
483 stars 108 forks source link

Simplify ffdec.py by using Popen.communicate() method #113

Open Bomme opened 2 years ago

Bomme commented 2 years ago

Hi, thanks for this very useful library! I was looking into ffdec.py since I need faster loading of mp3 and m4a files. I believe that the module could be improved and simplified by using the Popen.communicate() method. This seems to be the recommended way of retrieving output from a subprocess.

The current implementation only allows to read the data in blocks which is suboptimal since a user might not be able to adapt the block size. (E.g. librosa just calls audio_open() which has no way of setting a block size.)

I did a speed comparison that shows that this way of reading data is slower than it needs to be, especially for large files: https://gist.github.com/Bomme/d9aee452c8c1e68fb5fac743df6b2a07

If you decide to drop Python 2 support (https://github.com/beetbox/audioread/issues/112) the timeout handling might be easier. And for later versions of Python 3 the https://docs.python.org/3/library/subprocess.html#windows-popen-helpers might come in handy.

sampsyo commented 2 years ago

Hi! It could be worth looking into! But there are a few reasons why ffdec works this way now:

I believe Popen.communicate() only works in a synchronous/blocking style that reads everything until EOF. So I'm not sure we can use that, but maybe there's something else useful in modern subprocess we could rely on?

Bomme commented 2 years ago

Hi! Thanks for the explanation. Since the queue is infinite, the entire file could get read in the separate thread before the client code gets to call read_data. That's why I assumed it would be a simplification to make this explicit, i.e. loading the file into memory. Shouldn't the queue size be limited to support real streaming?

Bomme commented 2 years ago

Since my main goal was to speed up the file loading and you plan to release a new version soon, what do you think about raising the block_size? Maybe by setting it to io.DEFAULT_BUFFER_SIZE as a (still) conservative size.

sampsyo commented 2 years ago

Ah, yeah, that's a good point. We probably should be limiting the queue size! It seems tricky to get exactly right, but probably worth a shot nonetheless.

Raising the block size seems like a great idea! Would you mind putting together a small PR for that?