CPJKU / madmom

Python audio and music signal processing library
https://madmom.readthedocs.io
Other
1.32k stars 202 forks source link

Multi-channel support for STFT #371

Open noemievoss opened 6 years ago

noemievoss commented 6 years ago

Expected behaviour

STFT function should work with multi-channel songs.

Actual behaviour

At the moment, STFT doesn't support multi-channel songs. An example of error logs:

File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/madmom/audio/stft.py", line 325, in new circular_shift=circular_shift) File "/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/madmom/audio/stft.py", line 70, in stft 'shape %s.' % (type(frames), frames.shape)) ValueError: frames must be a 2D array or iterable, got <class 'madmom.audio.signal.FramedSignal'> with shape (21102, 2048, 2).

Steps needed to reproduce the behaviour

import madmom
import numpy as np
from madmom.models import PATTERNS_BALLROOM
from madmom.audio.spectrogram import LogarithmicSpectrogramProcessor, SpectrogramDifferenceProcessor, MultiBandSpectrogramProcessor
from madmom.processors import SequentialProcessor

proc = madmom.features.PatternTrackingProcessor(PATTERNS_BALLROOM, fps=50)
log = LogarithmicSpectrogramProcessor()
diff = SpectrogramDifferenceProcessor(positive_diffs=True)
mb = MultiBandSpectrogramProcessor(crossover_frequencies=[270])
pre_proc = SequentialProcessor([log, diff, mb])
act = pre_proc(file_path)  #path to a song with multiple channels (2 channels)
proc(act)

Information about installed software

Please provide some information about installed software.

import madmom
import numpy as np
import scipy

madmom.__version__    # '0.15.1'
np.__version__               # '1.14.0'
scipy.__version__           # '1.0.0'
superbock commented 6 years ago

The question you didn't answer is what exactly the expected behaviour actually is. There are multiple possibilities how the returned array should look like. I tend towards returning a 3D array with channels as last dimension, i.e. [time, freq, channel]. Is this what you are looking for?

If you simply want to calculate a STFT (or anything which needs a STFT) on a stereo (or multichannel) signal, passing num_channels=1 (down-mixing all channels to mono) is probably what you want.