MTG / essentia

C++ library for audio and music analysis, description and synthesis, including Python bindings
http://essentia.upf.edu
GNU Affero General Public License v3.0
2.83k stars 530 forks source link

Read audio in chunks #329

Open tcwalther opened 8 years ago

tcwalther commented 8 years ago

The streaming mode allows me to stream audio directly into transforms, which is really useful because it reduces memory overhead when working on large files (like, 1h long audio files). However, from my understanding, one cannot prototype streaming modules in Python (am I wrong?).

I would love to do the following. Instead of:

audio = MonoLoader(filename=filename)()
frameGenerator = FrameGenerator(audio, frameSize=frameSize, hopSize=hopSize, startFromZero=True)
for block in frameGenerator:
    fancy_calculation(block)

I'd love to do something like:

audioLoader = MonoFrameLoader(filename=filename, frameSize=frameSize, hopSize=hopSize):
for block in audioLoader:
    fancy_calculation(block)

Is this possible somehow?

Update:

librosa has the ability to load audio after certain offsets, however, the load time increases with the offset, leading to a horrible total time. Here is a measurement using current essentia master and the current librosa 0.4.1:

SAMPLE_RATE = 44100

def load_in_blocks(filename):
    frame_size = 5
    offset = 0
    block, sr = librosa.core.load(filename, sr=SAMPLE_RATE, offset=offset, duration=frame_size)
    while len(block) > 0:
        yield block
        offset += frame_size
        block, sr = librosa.core.load(filename, sr=SAMPLE_RATE, offset=offset, duration=frame_size)

filename = 'test.mp3'  # 24 minutes long

print 'loading file using essentia:'
%time ess.MonoLoader(filename=filename)()
print 'loading file using librosa:'
%time audio, sr = librosa.core.load(filename, sr=SAMPLE_RATE)
print 'loading file in chunks using librosa:'
%time load_in_blocks(filename)
loading file using essentia:
CPU times: user 1.5 s, sys: 76 ms, total: 1.58 s
Wall time: 1.65 s
loading file using librosa:
CPU times: user 1.93 s, sys: 294 ms, total: 2.22 s
Wall time: 3.53 s
loading file in chunks using librosa:
CPU times: user 39.5 s, sys: 2.05 s, total: 41.6 s
Wall time: 51.8 s
dbogdanov commented 8 years ago

No easy way for prototyping streaming modules, you'll have to implement your parts in C++.

Something similar to MonoFrameLoader is not really possible now, although we can do it if we re-implement AudioLoader. This would be a nice feature.