Use compressed audio directly from memory

jmercouris commented 8 years ago

It is currently possible to decode audio files to PCM. It would be additionally nice to be able to directly decode mp3 data frames to PCM.

sampsyo commented 8 years ago

This is related to #34, where the desire is to load audio data from the network. This would be a good idea, if we can retrofit a direct-from-memory interface onto all of our backends.

There's one complicating question: How do we know which backend to select? When reading from disk, it's easy: we just try reading the file with each, and if one backend fails (e.g., because the format is unsupported), we give up and "rewind" by reading the file from the beginning using a different backend. That will be harder to do in a streaming setting, where we don't have the luxury to travel back in time to read again from the beginning.

I also suspect some backends will want a MIME type for streamed data—i.e., they currently use the filename extension as a signal for the data type.

jmercouris commented 8 years ago

As a solution for any backends that require MIME types from a file we could "make" a file temporarily with a specific format. We would then keep appending to the file and having it decode it into a buffer which would then be returned to the user.

Addtionally helpful would be the ability to supply an argument of the expected format.

I've actually implemented a piece of code that works exactly on this principle, and I could share it with you, right now it has some small playback issues due to GIL, but if this path is something you'd be interested in, I can share it.

jksinton commented 8 years ago

I'm also interested in feeding audioread an MP3 read from memory instead of a filename. I have a darkice audio stream that I would like to follow, open the stream in chunks, and analyze using librosa.

Initially, I was thinking of configuring darkice to continuously write to an MP3 and following it with seek. But now, you have me thinking of reading the stream over the network.

As a test, I've successfully passed the MP3 binary to the stdin of an ffmpeg subprocess:

a = open("archive_active.mp3","r")
p = subprocess.Popen(['ffmpeg','-i','-','out.wav'],stdout=subprocess.PIPE, stdin=subprocess.PIPE, stderr=subprocess.STDOUT)
print p.communicate(input=a.read())[0].decode()

But I'm not sure where to begin on editing audioread.

sampsyo commented 8 years ago

Cool! If you're interested, maybe the right place to start would be with the ffdec backend. You could try hacking it in at first in the most non-elegant way possible—just port what you've done there to ffdec.py—and then we can sort out how to make the interface configurable.

jksinton commented 8 years ago

Okay, I forked audioread and made my own branch: https://github.com/jksinton/audioread/tree/compressed-audio

It's not pretty but ffdec now accepts compressed audio as an argument. One of the challenges is that Popen.communicate returns a tuple with str objects for both the stdout and stderr instead of file objects. The QueueReaderThread and _get_info functions rely on self.proc.stdou/stderr to be file objects. To solve this, I convert the str objects to file objects with StringIO. This cannot be the most efficient solution.

sampsyo commented 8 years ago

Cool; looks good! Maybe we should pull this into a branch in the central repository so everybody can work on it together.

I don't quite see why the new version needs use Popen.communicate. Does the old approach, which only reads one block at a time, not work? It would be great to preserve that "incremental" property—maybe we need a second thread to send data into the pipe as it becomes ready.

jksinton commented 8 years ago

Adrian,

Sure, I'll submit a pull request once we create a branch for this feature. I don't think I have the privileges to create a branch on the central repository.

I was using Popen.communicate because the Python subprocess documentation gave this warning:

Use communicate() rather than .stdin.write, .stdout.read or .stderr.read to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process.

See the warning right above Popen.stdin: https://docs.python.org/2/library/subprocess.html#subprocess.Popen.stdin

I might be able to prepare a QueueWriterThread function to write directly to Popen.stdin, similar to the QueueReaderThread.

-James

jksinton commented 8 years ago

I've pushed a new version to the compressed-audio branch on my fork. It replaces Popen.communicate with a WriterThread function and works with the QueueReaderThread function that you already had implemented. In other words, the stdout produced as a result of writing to proc.stdin is handled with the QueueReaderThread function without Popen.communicate.

I tried writing directly to self.proc.stdin.write(audio.read()) without a threaded write, e.g., WriterThread, but the process would hang. Not sure why when the WriterThread is essentially this:

def run(self):
        self.fh.write(self.audio.read())
        self.fh.close()

sampsyo commented 8 years ago

That sounds great. That "hanging" behavior, in fact, is exactly why we need a separate thread for reading (and now for writing): the OS call to read or write from a file descriptor can block. We need to concurrently block in the OS call while letting the rest of the application proceed. And, with the new extension, we will need to concurrently send data into the subprocess while reading data out of the same subprocess: so, two threads.

I will make you a collaborator on this repository—that should let you create a branch here whenever you're ready.

beetbox / audioread

Use compressed audio directly from memory #35