When loading a stereo audio file and downmixing it to mono, I expect the resulting amplitudes to not depend on the audio file format, but only on the content.
Actual behaviour
Currently, if a wave file has the the same sample type as the one desired when loading, madmom will use scipy to load it; then, to downmix the signal to mono, it will use its own madmom.audio.signal.remix function, which computes the arithmetic mean of the channels.
If the there is a mismatch in sample types (eg. the file is stored as float32 but loaded as float, or stored as 16-bit integers and loaded as float), madmom will use ffmpeg to load the file, and, in the same step, use ffmpeg to downmix to mono.
Now, the downmixing logic of ffmpeg apparantly uses a normalizing factor of 2 / sqrt(2) when downmixing. This results in different amplitudes.
Expected behaviour
When loading a stereo audio file and downmixing it to mono, I expect the resulting amplitudes to not depend on the audio file format, but only on the content.
Actual behaviour
Currently, if a wave file has the the same sample type as the one desired when loading, madmom will use
scipy
to load it; then, to downmix the signal to mono, it will use its ownmadmom.audio.signal.remix
function, which computes the arithmetic mean of the channels.If the there is a mismatch in sample types (eg. the file is stored as float32 but loaded as float, or stored as 16-bit integers and loaded as float), madmom will use
ffmpeg
to load the file, and, in the same step, useffmpeg
to downmix to mono.Now, the downmixing logic of ffmpeg apparantly uses a normalizing factor of
2 / sqrt(2)
when downmixing. This results in different amplitudes.Steps needed to reproduce the behaviour
Information about installed software