The Munich Open-Source Large-Scale Multimedia Feature Extractor
Spectral Centroid for white noise has large offset

closed 11 months ago

commented 11 months ago

When sampling at 48kHz the spectral centroid of a white noise signal should be in theory (also quite intuitively) half of the Nyquist frequency (24kHz in that case) leading to 12kHz.

OpenSMILE calculated a value about 168xxHz. I tested this using the packaged emo_large.conf.

To reproduce one can use this python script:

import pandas as pd
from import wavfile
from scipy import stats
import numpy as np
import librosa
import subprocess

# Config path for emo_large.conf (linux)
opensmile_config_emo_large_path = '/usr/share/opensmile/config/misc/emo_large.conf'
opensmile_binary = 'SMILExtract'

# Generate white noise and save it to disk as wav
sample_rate = 48000
length_in_seconds = 4
amplitude = 11
noise = stats.truncnorm(-1, 1, scale=min(2**16, 2**amplitude)).rvs(sample_rate * length_in_seconds)
wavfile.write('noise.wav', sample_rate, noise.astype(np.int16))

# Librosa
audio_file = 'noise.wav'
y, sr = librosa.load(audio_file, sr=sample_rate)
librosa_cog =  librosa.feature.spectral_centroid(y=y, sr=sr)
librosa_mean_cog = np.mean(librosa_cog)
print("Librosa: " + str(librosa_mean_cog))

# Manual
def spectral_centroid(x, samplerate=sample_rate):
    magnitudes = np.abs(np.fft.rfft(x)) # magnitudes of positive frequencies
    length = len(x)
    freqs = np.abs(np.fft.fftfreq(length, 1.0/samplerate)[:length//2+1]) # positive frequencies
    return np.sum(magnitudes*freqs) / np.sum(magnitudes)
manual_mean_cog = spectral_centroid(y, sr)
print("Manual: " + str(manual_mean_cog))

opensmile_command = [opensmile_binary,
                     '-C', opensmile_config_emo_large_path,
                     '-I', 'noise.wav',
                     '-csvoutput', 'noise.csv']
df = pd.read_csv('noise.csv', delimiter=';')
opensmile_mean_cog = df['pcm_fftMag_spectralCentroid_sma_amean']
print("OpenSMILE: " + str(opensmile_mean_cog))

which prints:

Librosa: 12010.02111870055
Manual: 12005.953776257056
OpenSMILE: 16872.99
commented 11 months ago

The issue here is the preemphasis of the signal (higher frequencies are amplified) employed in the emo_large.conf config file, so that the spectrum is not "white" anymore.

If you change line 81 to


you will get the expected result.

Generally, you might want to try also ComParE_2016.conf, which does not use preemphasis and has been updated more recently: